Unit - Iv

Advanced RDBMS UNIT IV
Topics: Concurrency Control Techniques Locking Techniques for Concurrency Control Concurrency Control Based on Timestamp Ordering Validation Concurrency Control Techniques Granularity of Data Items and Multiple Granularity Locking Using Locks for Concurrency Control In Indexes Database Recovery Techniques: Recovery Concepts Recovery Techniques Based On Deferred Update / Immediate Update / Shadow Paging The ARIES Recovery Algorithms Database Security and Authorization
4.0 Introduction
Concurrency control helps in isolation among conflicting transactions that takes part in database management. It helps to preserve the identity of every individual record or data and helps the database to retain consistency and ease of use that helps to promote the reliability factor.
4.1 Objective
The objective of this unit is to learn and understand the Concurrency Control Techniques in terms of Locking, Validation Granularity of Data Items and Multiple Granularity Locking, Database Recovery Techniques and Database Security and Authorization:
4.2 Contents
4.2.1Concurrency Control techniques: Locking Techniques for Concurrency control Purpose of Concurrency Control To enforce Isolation (through mutual exclusion) among conflicting transactions. To preserve database consistency through consistency preserving execution of transactions. To resolve read-write and write-write conflicts. Example: In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.
Page 129
Advanced RDBMS
Two-Phase Locking Techniques Locking is an operation which secures (a) permission to Read or (b) permission to Write a data item for a transaction. Example: Lock (X). Data item X is locked in behalf of the requesting transaction.
Unlocking is an operation which removes these permissions from the data item. Example: Unlock (X). Data item X is made available to all other transactions. Lock and Unlock are Atomic operations. Two-Phase Locking Techniques: Essential components Two locks modes (a) shared (read) and (b) exclusive (write). Shared mode: shared lock (X). More than one transaction can apply share lock on X for reading its value but no write lock can be applied on X by any other transaction. Exclusive mode: Write lock (X). Only one write lock on X can exist at any time and no shared lock can be applied by any other transaction on X. Lock Manager: Managing locks on data items. Lock table: Lock manager uses it to store the identify of transaction locking a data item, the data item, lock mode and pointer to the next data item locked. One simple way to implement a lock table is through linked list .
Database requires that all transactions should be wellformed. A transaction is well-formed if: It must lock the data item before it reads or writes to it. It must not lock an already locked data items and it must not try to unlock a free data item. Page 130
Advanced RDBMS
The following code performs the lock operation: B: if LOCK (X) =0 (*item is unlocked*) then LOCK (X) (*lock the item*) else begin wait (until lock (X) =0) and the lock manager wakes up the transaction); goto B end; The following code performs the unlock operation: LOCK (X) <-(*unlock the item*) if any transactions are waiting then wake up one of the waiting the transactions; The following code performs the read / write operation:
Page 131
Advanced RDBMS
Two-Phase Locking Techniques: The algorithm
Page 132
Advanced RDBMS
Two Phases: (a) Locking (Growing) (b) Unlocking (Shrinking). Locking (Growing) Phase: A transaction applies locks (read or write) on desired data items one at a time. Unlocking (Shrinking) Phase: A transaction unlocks its locked data items one at a time. Requirement: For a transaction these two phases must be mutually exclusively, that is, during locking phase unlocking phase must not start and during unlocking phase locking phase must not begin. To guarantee serializability, in a transaction, all lock operations (S_Lock or X_Lock) precede the first unlock operations. No Locks can be acquired after the first lock is released. We call a transaction satisfying two-phase locking protocol, if it obeys the above rules. The two-phase execution involves: Growing Phase Lock acquisition only (no lock) and Shrinking Phase Lock release (no more lock). A lock point divides the two phases as shown below:
Page 133
Advanced RDBMS
Two-phase policy generates two locking algorithms (a) Basic and (b) Conservative. Conservative: Prevents deadlock by locking all desired data items before transaction begins execution. Basic: Transaction locks data items incrementally. This may cause deadlock which is dealt with. Strict: A more stricter version of Basic algorithm where unlocking is performed after a transaction terminates (commits or aborts and rolledback). This is the most commonly used two-phase locking algorithm. Dealing with Deadlock and Starvation Deadlock T1 T2 read_lock (Y); T1 and T2 did follow two-phase read_item (Y); policy but they are deadlock read_lock (X); read_item (Y); Page 134
Advanced RDBMS
write_lock (X); (waits for X) write_lock (Y);(waits for Y) Deadlock (T1 and T2) Deadlock prevention A transaction locks all data items it refers to before it begins execution. This way of locking prevents deadlock since a transaction never waits for a data item. The conservative two-phase locking uses this approach. Deadlock occurs when each of two transactions is waiting for the other to release the lock on an item.
In general a deadlock may involve n (n>2) transactions, and can be detected by using a wait-for graph.
Deadlock detection and resolution In this approach, deadlocks are allowed to happen. The scheduler maintains a wait-forgraph for detecting cycle. If a cycle exists, then one transaction involved in the cycle is selected(victim) and rolledback. A wait-for-graph is created using the lock table. As soon as a transaction is blocked, it is added to the graph. When a chain like: Ti waits for Tj waits for Tk waits for Ti or Tj occurs, then this creates a cycle. One of the transaction of the cycle is selected and rolled back.
Page 135
Advanced RDBMS
Deadlock avoidance There are many variations of two-phase locking algorithm. Some avoid deadlock by not letting the cycle to complete. That is as soon as the algorithm discovers that blocking a transaction is likely to create a cycle, it rolls back the transaction. Wound-Wait and Wait-Die algorithms use time stamps to avoid deadlocks by rolling-back victim. For example:
Deadlock can be considered as A cycle in the Wait-for Graph. A Deadlock is broken by rolling back any one of the transactions causing deadlock. Dealing with deadlocks involve: 1. Deadlock Detection and resolution which comprises of the following steps: a. Construct the wait-for graph. b. Periodic Checks for deadlocks using graph algorithms based on waiting time of transactions and number of concurrent transactions. c. Actions to be taken in case of occurrence of deadlocks which could be selecting victims and aborting and watchout for starvation. 2. Deadlock avoidance is of two ways such as: a. Acquiring all the locks at once (Less concurrency). b. Acquiring the locks in pre-fixed order (Cannot go back) Starvation Starvation occurs when a particular transaction consistently waits or restarted and never gets a chance to proceed further. In a deadlock resolution it is possible that the same transaction may consistently be selected as victim and rolled-back. This limitation is inherent in all priority based scheduling mechanisms. In Wound-Wait scheme a younger transaction may always be wounded (aborted) by a long running older transaction which may create starvation. Deadlock Avoidance Strategies These include: 1. No waiting: - if no lock abort and restart without waiting for deadlock. 2. Cautious waiting waiting for a lock to be obtained else abort 3. Based on timeouts: Long waits are assumed as deadlocks and aborted. 4.2.2 Concurrency control based on Timestamp Ordering Timestamp A monotonically increasing variable (integer) indicating the age of an operation or a transaction. A larger timestamp value indicates a more recent event or operation. Page 136
Advanced RDBMS
Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions. Basic Timestamp Ordering 1. Transaction T issues a write_item(X) operation: a. If read_TS(X) >TS(T) or if write_TS(X) > TS(T), then an younger transaction has already read the data item so abort and roll-back T and reject the operation. b. If the condition in part (a) does not exist, then execute write_item(X) of T and set write_TS(X) to TS(T). 2. Transaction T issues a read_item(X) operation: a. If write_TS(X) > TS(T), then an younger transaction has already written to the data item so abort and roll-back T and reject the operation. b. If write_TS(X) <=TS(T), then execute read_item(X) of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X). Strict Timestamp Ordering 1. Transaction T issues a write_item(X) operation: a. If TS(T) > read_TS(X), then delay T until the transaction T that wrote or read X has terminated (committed or aborted). 2. Transaction T issues a read_item(X) operation: a. If TS(T) > write_TS(X), then delay T until the transaction T that wrote or read X has terminated (committed or aborted). Thomass Write Rule 1. If read_TS(X) > TS(T) then abort and roll-back T and reject the operation. 2. If write_TS(X) > TS(T), then just ignore the write operation and continue execution. This is because the most recent writes counts in case of two consecutive writes. 3. If the conditions given in 1 and 2 above do not occur, then execute write_item(X) of T and set write_TS(X) to TS(T).
Page 137
Advanced RDBMS
4.2.3 Multiversion concurrency control Techniques This approach maintains a number of versions of a data item and allocates the right version to a read operation of a transaction. Thus unlike other mechanisms a read operation in this mechanism is never rejected. Side effect: Significantly more storage (RAM and disk) is required to maintain multiple versions. To check unlimited growth of versions, a garbage collection is run when some criteria is satisfied. Multiversion technique based on timestamp ordering This approach maintains a number of versions of a data item and allocates the right version to a read operation of a transaction. Thus unlike other mechanisms a read operation in this mechanism is never rejected. Side effects: Significantly more storage (RAM and disk) is required to maintain multiple versions. To check unlimited growth of versions, a garbage collection is run when some criteria is satisfied. Assume X1, X2, , Xn are the version of a data item X created by a write operation of transactions. With each Xi a read_TS (read timestamp) and a write_TS (write timestamp) are associated. read_TS(Xi): The read timestamp of Xi is the largest of all the timestamps of transactions that have successfully read version Xi. write_TS(Xi): The write timestamp of Xi that wrote the value of version Xi. A new version of Xi is created only by a write operation. To ensure serializability, the following two rules are used. If transaction T issues write_item (X) and version i of X has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), and read _TS(Xi) > TS(T), then abort and rollback T; otherwise create a new version Xi and read_TS(X) = write_TS(Xj) = TS(T). If transaction T issues read_item (X), find the version i of X that has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), then return the value of Xi to T, and set the value of read _TS(Xi) to the largest of TS(T) and the current read_TS(Xi). To ensure serializability, the following two rules are used. 1. If transaction T issues write_item (X) and version i of X has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), and read _TS(Xi) > TS(T), then abort and roll-back T; otherwise create a new version Xi and read_TS(X) = write_TS(Xj) = TS(T). 2. If transaction T issues read_item (X), find the version i of X that has the highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T), then return the
Page 138
Advanced RDBMS
value of Xi to T, and set the value of read _TS(Xi) to the largest of TS(T) and the current read_TS(Xi). Rule 2 guarantees that a read will never be rejected. Multiversion Two-Phase Locking Using Certify Locks Allow a transaction T to read a data item X while it is write locked by a conflicting transaction T. This is accomplished by maintaining two versions of each data item X where one version must always have been written by some committed transaction. This means a write operation always creates a new version of X. Steps 1. X is the committed version of a data item. 2. T creates a second version X after obtaining a write lock on X. 3. Other transactions continue to read X. 4. T is ready to commit so it obtains a certify lock on X. 5. The committed version X becomes X. 6. T releases its certify lock on X, which is X now. Compatibility tables for Read Write yes no no no Read Write Read Write Certify yes no no no no no no no no Read Write Certify read/write locking scheme read/write/certify locking scheme Note In multiversion 2PL read and write operations from conflicting transactions can be processed concurrently. This improves concurrency but it may delay transaction commit because of obtaining certify locks on all its writes. It avoids cascading abort but like strict two phase locking scheme conflicting transactions may get deadlocked. 4.2.4 Validation (Optimistic) Concurrency Control Techniques In this technique only at the time of commit serializability is checked and transactions are aborted in case of non-serializable schedules. Three phases:
Page 139
Advanced RDBMS
Read phase: A transaction can read values of committed data items. However, updates are applied only to local copies (versions) of the data items (in database cache). Validation phase: Serializability is checked before transactions write their updates to the database. This phase for Ti checks that, for each transaction Tj that is either committed or is in its validation phase, one of the following conditions holds: 1. Tj completes its write phase before Ti starts its read phase. 2. Ti starts its write phase after Tj completes its write phase, and the read_set of Ti has no items in common with the write_set of Tj 3. Both the read_set and write_set of Ti have no items in common with the write_set of Tj, and Tj completes its ead phase. When validating Ti, the first condition is checked first for each transaction Tj, since (1) is the simplest condition to check. If (1) is false then (2) is checked and if (2) is false then (3 ) is checked. If none of these conditions holds, the validation fails and Ti is aborted. Write phase: On a successful validation transactions updates are applied to the database; otherwise, transactions are restarted.
4.2.5 Granularity of data items and Multiple Granularity Locking A lockable unit of data defines its granularity. Granularity can be coarse (entire database) or it can be fine Data item granularity significantly affects concurrency control performance. Thus, the degree of concurrency is low for coarse granularity and high for fine granularity. Example of data item granularity:
Page 140
Advanced RDBMS
1. A field of a database record (an attribute of a tuple). 2. A database record (a tuple or a relation). 3. A disk block. 4. An entire file. 5. The entire database. To manage such hierarchy, in addition to read and write, three additional locking modes, called intention lock modes are defined: Intention-shared (IS): indicates that a shared lock(s) will be requested on some descendent nodes(s). Intention-exclusive (IX): indicates that an exclusive lock(s) will be requested on some descendent nodes(s). Shared-intention-exclusive (SIX): indicates that the current node is locked in shared mode but an exclusive lock(s) will be requested on some descendent nodes(s). These locks are applied using the following compatibility matrix:
The set of rules which must be followed for producing serializable schedule are 1. The lock compatibility must adhered to. 2. The root of the tree must be locked first, in any mode..
Page 141
Advanced RDBMS
3. A node N can be locked by a transaction T in S or IX mode only if the parent node is already locked by T in either IS or IX mode. 4. A node N can be locked by T in X, IX, or SIX mode only if the parent of N is already locked by T in either IX or SIX mode. 5. T can lock a node only if it has not unlocked any node (to enforce 2PL policy). 6. T can unlock a node, N, only if none of the children of N are currently locked by T. Using Locks for concurrency control in Indexes Real-time database systems are expected to rely heavily on indexes to speed up data access and, thereby, help more transactions meet their deadlines. Accordingly, highperformance index concurrency control (ICC) protocols are required to prevent contention for the index from becoming a bottleneck. A new real-time ICC protocol called GUARD-link augments the classical B-link protocol with a feedback-based admission control mechanism and also supports both point and range queries, as well as the undos of the index actions of aborted transactions. The performance metrics used in evaluating the ICC protocols are the percentage of transactions that miss their deadlines and the fairness with respect to transaction type and size. The performance characteristics of the real-time version of an ICC protocol could be significantly different from the performance of the same protocol in a conventional (nonreal-time) database system. In particular, B-link protocols, which are reputed to provide the best overall performance in conventional database systems, perform poorly under heavy real-time loads. The new GUARD-link protocol, however, although based on the B-link approach, delivers the best performance (with respect to all performance metrics) for a variety of real-time transaction workloads, by virtue of its admission control mechanism. 4.2.6 Database Recovery Techniques : Recovery Concepts The Database can be updated immediately, but an update operation must be recorded in the log before it is applied to the database. In a single-user system, if a failure occurs, it undone all operations When concurrent execution is permitted, the recovery process depends on the protocols used for concurrency control. For example, a strict two phase locking protocol does not allow a transaction to read or write an item unless the transaction that last wrote the item has committed Database recovery refers to the Process of restoring database to a correct state in the event of a failure. The Need for Recovery Control involves: Two types of storage: volatile (main memory) and nonvolatile.
Page 142
Advanced RDBMS
Volatile storage does not survive system crashes. Stable storage represents information that has been replicated in several nonvolatile storage media with independent failure modes. Failure types The Failure types could be different based on - System crashes, resulting in loss of main memory, Media failures, resulting in loss of parts of secondary storage, Application software errors, Natural physical disasters, Carelessness or unintentional destruction of data or facilities and Sabotage. A good DBMS should provide following facilities to assist with recovery: Backup mechanism, which makes periodic backup copies of database. Logging facilities, which keep track of current state of transactions and database changes. Checkpoint facility, which enables updates to database in progress to be made permanent. Recovery manager, which allows DBMS to restore the database to a consistent state following a failure. A Log file contains information about all updates to database: Transaction records. Checkpoint records. Transaction records contain: Transaction identifier. Type of log record, (transaction start, insert, update, delete, abort, commit). Identifier of data item affected by database action (insert, delete, and update operations). Before-image of data item. After-image of data item. Log management information. Checkpoint is defined as a Point of synchronization between database and log file. All buffers are force-written to secondary storage. Checkpoint record is created containing identifiers of all active transactions. When failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash. If database has been damaged, there is a Need to restore last backup copy of database and reapply updates of committed transactions using log file. If database is only inconsistent, there is a Need to undo changes that caused inconsistency. This may also need to redo some transactions to ensure updates reach secondary storage. This does not need backup, but can restore database using before and after-images in the log file. Main Recovery Techniques Three main recovery techniques: Deferred Update
Page 143
Advanced RDBMS
Immediate Update Shadow Paging. Deferred Updates Updates are not written to the database until after a transaction has reached its commit point. If transaction fails before commit, it will not have modified database and so no undoing of changes required. May be necessary to redo updates of committed transactions as their effect may not have reached database. Immediate Updates Updates are applied to database as they occur. Need to redo updates of committed transactions following a failure. May need to undo effects of transactions that had not committed at time of failure. Essential that log records are written before write to database called as - Write-ahead log protocol. If no "transaction commit" record in log, then that transaction was active at failure and must be undone. Undo operations are performed in reverse order in which they were written to log. Shadow Paging Maintain two page tables during life of a transaction: current page and shadow page table. When transaction starts, two pages are the same. Shadow page table is never changed thereafter and is used to restore database in event of failure. During transaction, current page table records all updates to database. When transaction completes, current page table becomes shadow page table. This recovery scheme does not require the use of a log in a single-user environment. In a multiuser environment, a log may be needed for the concurrency control method. When a transaction begins executing, the current directory, whose entries point to the most recent or current database pages on disk, is copied into a shadow directory. The shadow directory is then saved on disk while the current directory is used by the transaction. When a write item operation is performed, a new copy of the modified database page is created. To recovery from a failure during transaction execution, it is sufficient to free the modified database pages and to discard the current directory.
Page 144
Advanced RDBMS
4.2.7 The ARIES Recovery Algorithm The ARIES Recovery Algorithm is based on: 1. WAL (Write Ahead Logging) 2. Repeating history during redo: ARIES will retrace all actions of the database system prior to the crash to reconstruct the database state when the crash occurred. 3. Logging changes during undo: It will prevent ARIES from repeating the completed undo operations if a failure occurs during recovery, which causes a restart of the recovery process. The ARIES recovery algorithm consists of three steps: 1. Analysis: step identifies the dirty (updated) pages in the buffer and the set of transactions active at the time of crash. The appropriate point in the log where redo is to start is also determined. 2. Redo: necessary redo operations are applied. 3. Undo: log is scanned backwards and the operations of transactions active at the time of crash are undone in reverse order. The Log and Log Sequence Number (LSN) A log record stores: 1. Previous LSN of that transaction: It links the log record of each transaction. It is like a back pointer points to the previous record of the same transaction. 2. Transaction ID
Page 145
Advanced RDBMS
3. Type of log record. For a write operation the following additional information is logged: 4. Page ID for the page that includes the item 5. Length of the updated item 6. Its offset from the beginning of the page 7. BFIM of the item 8. AFIM of the item The Transaction table and the Dirty Page table For efficient recovery following tables are also stored in the log during checkpointing: Transaction table: Contains an entry for each active transaction, with information such as transaction ID, transaction status and the LSN of the most recent log record for the transaction. Dirty Page table: Contains an entry for each dirty page in the buffer, which includes the page ID and the LSN corresponding to the earliest update to that page. Checkpointing A checkpointing does the following: 1. Writes a begin_checkpoint record in the log 2. Writes an end_checkpoint record in the log. With this record the contents of transaction table and dirty page table are appended to the end of the log. 3. Writes the LSN of the begin_checkpoint record to a special file. This special file is accessed during recovery to locate the last checkpoint information. To reduce the cost of checkpointing and allow the system to continue to execute transactions, ARIES uses fuzzy checkpointing. The following steps are performed for recovery 1. Analysis phase: Start at the begin_checkpoint record and proceed to the end_checkpoint record. Access transaction table and dirty page table are appended to the end of the log. Note that during this phase some other log records may be written to the
Page 146
Advanced RDBMS
log and transaction table may be modified. The analysis phase compiles the set of redo and undo to be performed and ends. 2. Redo phase: Starts from the point in the log up to where all dirty pages have been flushed, and move forward to the end of the log. Any change that appears in the dirty page table is redone. 3. Undo phase: Starts from the end of the log and proceeds backward while performing appropriate undo. For each undo it writes a compensating record in the log. The recovery completes at the end of undo phase An example of the working of ARIES scheme
4.2.8 Recovery In Multi Database System A multidatabase system is a special distributed database system where one node may be running relational database system under Unix, another may be running object-oriented system under window and so on. A transaction may run in a distributed fashion at multiple nodes. In this execution scenario the transaction commits only when all these multiple nodes agree to commit individually the part of the transaction they were executing. This commit scheme is referred to as two-phase commit (2PC). If any one of these nodes fails or cannot commit the part of the transaction, then the transaction is aborted. Each node recovers the transaction under its own recovery protocol.
Page 147
Advanced RDBMS
In some cases a single transaction (called a multidatabase transaction) may require access to multiple database. To maintain the atomicity of a multidatabase transaction, it is necessary to have a two level recovery mechanism a global recovery manager or coordinator is needed Phase -1 : When all participating databases signal the coordinator that the part of the multidatabase transaction involving each has concluded, the coordinator sends a message prepare for commit, participating databases send OK, according to the result of their force-write. Phase-2: If all participating databases reply OK the transaction is successful and the coordinator sends a commit signal to the participating databases Database Backup and Recovery from Catastrophic Failures A key assumption has been that the system log is maintained on the disk and is not lost as a result of the failure. The recovery manager of a DBMS must also be equipped to handle more catastrophic failures such as disk crashes. The main technique used to handle such cases is that of database backup . The whole database and the log are periodically copied onto a cheap storage medium such as magnetic tapes. Database Security and Authorization : Database Security Issue a. Security Issues - Access Controls The most common form of access control in a relational database is the view (for a detailed discussion of relational databases, see [RobCor93]). The view is a logical table, which is created with the SQL VIEW command. This table contains data from the database obtained by additional SQL commands such as JOIN and SELECT. If the database is unclassified, the source for the view is the entire database. If, on the other hand, the database is subject to multilevel classification, then the source for the view is that subset of the database that is at or below the classification level of the user. Users can read or modify data in their view, but the view prohibits users from accessing data at a classification level above their own. In fact, if the view is properly designed, a user at a lower classification level will be unaware that data exists at a higher classification level [Denn87a]. In order to define what data can be included in a view source, all data in the database must receive an access classification. Denning [Denn87a] lists several potential access classes that can be applied.
Page 148
Advanced RDBMS
These include: (1) Type dependent: Classification is determined based on the attribute associated with the data. (2) Value dependent: Classification is determined based on the value of the data. (3) Source level: Classification of the new data is set equivalent to the classification of the data source. (4) Source label: The data is arbitrarily given a classification by the source or by the user who enters the data. Classification of data and development of legal views become much more complex when the security goal includes the reduction of the threat of inference attacks. Inference is typically made from data at a lower classification level that has been derived from higher level data. The key to this relationship is the derivation rule, which is defined as the operation that creates the derived data (for example, a mathematical equation). A derivation rule also specifies the access class of the derived data. To reduce the potential for inference, however, the data elements that are inputs to the derivation must be examined to determine whether one or more of these elements are at the level of the derived data. If this is the case, no inference problem exists. If, however, all the elements are at a lower level than the derived data, then one or more of the derivation inputs must be promoted to a higher classification level [Denn87a]. The use of classification constraints to counter inference, beyond the protections provided by the view, requires additional computation. Thuraisingham and Ford [ThurFord95] discuss one way that constraint processing can be implemented. In their model, constraints are processed in three phases. Some constraints are processed during design (these may be updated later), others are processed when the database is queried to authorize access and counter inference, and many are processed during the update phase. Their strategy relies on two inference engines, one for query processing and one for update processing. Essentially, the inference engines are middlemen, which operate between the DBMS and the interface (see figure 1). According to Thuraisingham and Ford, the key to this strategy is the belief that most inferential attacks will occur as a result of summarizing a series of queries (for example, a statistical inference could be made by using a string of queries as a sample) or by interpreting the state change of certain variables after an update. The two inference engines work by evaluating the current task according to a set of rules and determining a course of action. The inference engine for updates dynamically revises the security constraints of the database as the security conditions of the organization change and as the security characteristics of the data stored in the database change. The inference engine for query processing evaluates each entity requested in the query, all the data released in a specific period that is at the security level of the current query, and relevant data available externally at the same security level. This is called the knowledge
Page 149
Advanced RDBMS
base. The processor evaluates the potential inferences from the union of the knowledge base and the querys potential response. If the users security level dominates the security levels of all of the potential inferences, the response is allowed [ThurFord95]. b. Security Issues -Integrity The integrity constraints in the relational model can be divided into two categories: (1) implicit constraints and (2) explicit constraints. Implicit constraints which include domain, relational, and referential constraints enforce the rules of the relational model. Explicit constraints enforce the rules of the organization served by the DBMS. As such, explicit constraints are one of the two key elements (along with views) of security protection in the relational model [BellGris92]. Accidental or deliberate modification of data can be detected by explicit constraints. Pfleeger [Pflee89] lists several error detection methods, such as parity checks, that can be enforced by explicit constraints. Earlier we discussed local integrity constraints (section 2.2.). These constraints are also examples of explicit constraints. Multilevel classification constraints are another example. A final type of explicit constraint enforces polyinstantiation integrity. Polyinstantiation refers to the replication of a tuple in a multilevel access system. This occurs when a user at a lower level L2 enters a tuple into the database which has the same key as a tuple which is classified at a higher level L1 (L1 > L2). The DBMS has two options. It can refuse the entry, which implies that a tuple with the same key exists at L1 or it can allow the entry. If it allows the entry, then two tuples with identical keys exist in the database. This condition is called polyinstantiation [Haig91]. Obvious integrity problems can result. The literature contains several algorithms for ensuring polyinstantiation integrity. Typically, explicit constraints are implemented using the SQL ASSERT or TRIGGER commands. ASSERT statements are used to prevent an integrity violation. Therefore, they are applied before an update. The TRIGGER is part of a response activation mechanism. If a problem with the existing database is detected (for example, an error is detected after a parity check), then a predefined action is initiated [BellGris92]. More complicated explicit constraints like multilevel classification constraints require additional programming with a 3GL. This is the motivation for the constraint processor. So, SQL and, consequently, the relational model alone cannot protect the database from determined inferential attack
Page 150
Advanced RDBMS
4.2.9 Object-oriented Database Security Object-oriented Databases While it is not the intent of this paper to present a detailed description of the objectoriented model, the reader may be unfamiliar with the elements of a object-oriented database. For this reason, we will take a brief look at the object-oriented model's basic structure. For a more detailed discussion, the interested reader should see [Bert92, Stein94, or Sud95]. The basic element of an object-oriented database is the object. An object is defined by a class. In essence, classes are the blueprints for objects. In the object-oriented model, classes are arranged in a hierarchy. The root class is found at the top of the hierarchy. This is the parent class for all other classes in the model. We say that a class that is the descendent from a parent inherits the properties of the parent class. As needed, these properties can be modified and extended in the descendent class [MilLun92]. An object is composed of two basic elements: variables and methods. An object holds three basic variables types: (1) Object class: This variable keeps a record of the parent class that defines the object. (2) Object ID (OID): A record of the specific object instance. The OID is also kept in an OID table. The OID table provides a map for finding and accessing data in the object-oriented database. As we will see, this also has special significance in creating a secure database. (3) Data stores: These variables store data in much the same way that attributes store data in a relational tuple [MilLun92]. Methods are the actions that can be performed by the object and the actions that can be performed on the data stored in the object variables. Methods perform two basic functions: They communicate with other objects and they perform reads and updates on the data in the object. Methods communicate with other objects by sending messages. When a message is sent to an object, the receiving object creates a subject. Subjects execute methods; objects do not. If the subject has suitable clearance, the message will cause the subject to execute a method in the receiving object. Often, when the action at the called object ends, the subject will execute a method that sends a message to the calling object indicating that the action has ended [MilLun92]. Methods perform all reading and writing of the data in an object. For this reason, we say that the data is encapsulated in the object. This is one of the important differences between object-oriented and relational databases [MilLun92]. All control for access,
Page 151
Advanced RDBMS
modification, and integrity start at the object level. For example, if no method exists for updating a particular object's variable, then the value of that variable is constant. Any change in this condition must be made at the object level. Access Controls As with the relational model, access is controlled by classifying elements of the database. The basic element of this classification is the object. Access permission is granted if the user has sufficient security clearance to access the methods of an object. Millen and Lunt [MilLun92] describe a security model that effectively explains the access control concepts in the object-oriented model. Their model is based on six security properties: Property 1 (Hierarchy Property). The level of an object must dominate that of its class object. Property 2 (Subject Level Property). The security level of a subject dominates the level of the invoking subject and it also dominates the level of the home object. Property 3 (Object Locality Property). A subject can execute methods or read or write variables only in its home object. Property 4 (*-Property) A subject may write into its home object only if its security is equal to that of the object. Property 5 (Return value property) A subject can send a return value to its invoking subject only if it is at the same security level as the invoking subject. Property 6 (Object creation property) The security level of a newly-created object dominates the level of the subject that requested the creation [MilLun92]. Property 1 ensures that the object that inherits properties from its parent class has at least the same classification level as the parent class. If this were not enforced, then users could gain access to methods and data for which they do not have sufficient clearance. Property 2 ensures that the subject created by the receiving object has sufficient clearance to execute any action from that object. Hence, the classification level given to the subject must be equal to at least the highest level of the entities involved in the action. Property 3 enforces encapsulation. If a subject wants to access data in another object, a message must be sent to that object where a new subject will be created. Property 6 states that new objects must have at least as high a clearance level as the subject that creates the object. This property prevents the creation of a covert channel. Properties 4 and 5 are the key access controls in the model.
Page 152
Advanced RDBMS
Property 4 states that the subject must have sufficient clearance to update data in its home object. If the invoking subject does not have as high a classification as the called object's subject, an update is prohibited. Property 5 ensures that if the invoking subject from the calling object does not have sufficient clearance, the subject in the called object will not return a value. The object-oriented model and the relational model minimize the potential for inference in a similar manner. Remaining consistent with encapsulation, the classification constraints are executed as methods. If a potential inference problem exists, access to a particular object is prohibited [MilLun92]. Integrity As with classification constraints, integrity constraints are also executed at the object level [MilLun92]. These constraints are similar to the explicit constraints used in the relational model. The difference is in execution. An object-oriented database maintains integrity before and after an update by executing constraint checking methods on the affected objects. As we saw in section 4.1.2., a relational DBMS takes a more global approach. One of the benefits of encapsulation is that subjects from remote objects do not have access to a called object's data. This is a real advantage that is not present in the relational DBMS. Herbert [Her94] notes that an object oriented system derives a significant benefit to database integrity from encapsulation. This benefit stems from modularity. Since the objects are encapsulated, an object can be changed without affecting the data in another object. So, the process that contaminated one element is less likely to affect another element of the database. 4.2.10 Object-Oriented Database Security Problems in the Distributed Environment Sudama [Sud95] states that there are many impediments to the successful implementation of a distributed object-oriented database. The organization of the object-oriented DDBMS is more difficult than the relational DDBMS. In a relational DDBMS, the role of client and server is maintained. This makes the development of multilevel access controls easier. Since the roles of client and server are not well defined in the object-oriented model, control of system access and multilevel access is more difficult. System access control for the object-oriented DDBMS can be handled at the host site in a procedure similar to that described for the relational DDBMS. Since there is no clear definition of client and server, however, the use of replicated multisite approval would be impractical. Multilevel access control problems arise when developing effective and efficient authorization algorithms for subjects that need to send messages to multiple objects across several geographically separate locations. According to Sudama [Sud95], there are
Page 153
Advanced RDBMS
currently no universally accepted means for enforcing subject authorization in a pure object-oriented distributed environment. This means that, while individual members have developed there own authorization systems, there is no pure object-oriented vendorindependent standard which allows object-oriented database management systems (OODBMS) from different vendors (a heterogeneous distributed system) to communicate in a secure manner. Without subject authorization, the controls described in the previous section cannot be enforced. Since inheritance allows one object to inherit the properties of its parent, the database is easily compromised. So, without effective standards, there is no way to enforce multilevel classification. Sudama [Sud95] notes that one standard does exist, called OSF DCE (Open Software Foundation's Distributed Computing Environment), that is vendor-independent, but is not strictly an object-oriented database standard. While it does provide subject authorization, it treats the distributed object environment as a client/server environment as is done in the relational model. He points out that this problem may be corrected in the next release of the standard. The major integrity concern in a distributed environment that is not a concern in the centralized database is the distribution of individual objects. Recall that a RDBMS allows the fragmentation of tables across sites in the system. It is less desirable to allow the fragmentation of objects because this can violate encapsulation. For this reason, fragmentation should be explicitly prohibited with an integrity constraint [Her94] The DBA has a DBA account in the DBMS, Which provides powerful capabilities that are not made available to regular database accounts and users. DBA account can be used to perform the following types of actions : 1. Account creation - creates a new account and password for a user or a group of users. 2. Privilege granting permits the accounts DBA to grant certain privilege to certain
3. Privilege revocation permits the DBA to revoke certain privileges that were previously given to certain accounts. 4. Security level assignment- assigning user accounts to the appropriate security classification level. The DBA is fully responsible for the overall security of the system.
Page 154
Advanced RDBMS
Discretionary Access Control Based on Granting / Revoking of Privileges The typical method is based on the granting and revoking of privileges. Two levels for assigning privileges to use the database system : 1. The account level- the DBA specifies the particular privileges that each account holds independently of the relations in the database ( Create TABLE, Create VIEW, Drop privilege) 2. The relation level control the privilege to access each individual relation or view in the database (Generally known as the access matrix model, where the rows are subjects users, account, programs and the columns are objects relations, records, columns, views, operations) In SQL the following types of privileges can be granted on each individual relation R: Example: DBA can issue GRANT CREATETAB TO ACC!; CREATE SCHEMA COMPANY AUTHORIZATION ACC1; Next Acc1 can issue GRANT INSERT DELETE ON EMPLOYEE, DEPARTMENT TO ACC2; Next Acc1 can issue GRANT SELECT ON EMPLOYEE, DEPARTMENT TO ACC3 WITH GRANT OPTION; Now ACC3 can issue GRANT SELECT ON EMPLOYEE TO ACC4; Now ACC1 can issue Select - gives the account retrieval privilege. Modify gives the account the capability to modify tuples. Of R References gives the account the capability to reference relation R when specifying integrity constraints. The view mechanism is an important discretionary authorization mechanism.
Page 155
Advanced RDBMS
REVOKE SELECT ON EMPLOYEE FROM ACC3; ACC1 also can issue CREATE VIEW EMPLOYEE AS SELECT NAME, BDATE, ADDRESS FROM EMPLOYEE WHERE DNO=20; GRANT SELECT ON EMPLOYEE TO ACC3 WITH GRANT OPTION; Finally ACC1 can issue GRANT UPDATE ON EMPLOYEE (SALARY) TO ACC4; Mandatory Access Control For Multilevel Security The discretionary access control technique of granting and revoking privileges is an all or-nothing method. The need for multilevel security exists in Government, Industry and corporate applications Typical security classes are top secret( TS), Secret(s), Confidential(c) and unclassified (u), where TS>S>C>U Two restrictions on data access based on the subject / object (s/o classifications 1. A subject S is not allowed read access to an object O unless class (s)>class(o) 2. A subject S is not allowed to write an object O unless class (s) <class(O) Statistical Database Security Statistical databases are used mainly to produce statistics on various populations. A population is a set of tuples of a relation that satisfy some selection condition. Statistical database security techniques must prevent the retrieval of individual dataIn some cases it may be possible to infer the values of individual tuples from a sequence of statistical queries. The possibility of inferring individual information from statistical queries is reduced if no statistical queries are permitted whenever the number of tuples in the population specified by the selection condition falls below some threshold
Page 156
Advanced RDBMS
4.3 Revision Points

o Concurrency control effects o To enforce Isolation (through mutual exclusion) among conflicting transactions. To preserve database consistency through consistency preserving execution of transactions. To resolve read-write and write-write conflicts.
Two-Phase Locking Techniques Two modes are available Deadlock situation It is a situation where in which the two transactions wait for each other to perform the operation and release the lock for a particular item. o Starvation This occurs when a specific transaction waits or restarted and never gets a chance to proceed further. o Time Stamp The repeated work on a particular variable indicates the age of an operation or a transaction. The larger timestamp value indicates that it is the latest operation. o Granularity of DATA items A lockable unit of data defines its granularity. Granularity can be a database or a record, which is processed. The granularity affects concurrency control performance. o Shadow Paging When a transaction begins executing, the current directory, whose entries point to the most recent or current database pages on disk, is copied into a directory known as shadow directory. In order to recovery from a failure during transaction execution, it is sufficient to free the modified database pages and to discard the current directory. a) Shared and b) Exclusive
Page 157
Advanced RDBMS
o The account level- the DBA specifies the particular privileges that each account holds independently of the relations in the database ( Create TABLE, Create VIEW, Drop privilege) o The relation level control the privilege to access each individual relation or view in the database (Generally known as the access matrix model, where the rows are subjects users, account, programs and the columns are objects relations, records, columns, views, operations)
4.4 Intext questions

1. By giving example illustrate the Aries Algorithm ? 2. What is a Time Stamp ? 3. Write a note on concurrency control techniques ? 4. Give a brief account of Deadlock situation ? 5.. What are the validation concurrency control techniques ? 6. Discuss about the Security issues in Object Oriented Databases ?
4.5 Summary
Concurrency control helps in isolation among conflicting transactions that takes part in database management. In multiversion 2PL read and write operations from conflicting transactions can be processed concurrently. This improves concurrency but it may delay transaction commit because of obtaining certify locks on all its writes. It avoids cascading abort but like strict two phase locking scheme conflicting transactions may get deadlocked. The Degree of concurrency is low for coarse granularity and high for fine granularity. When concurrent execution is permitted, the recovery process depends on the protocols used for concurrency control. Transaction table: Contains an entry for each active transaction, with information such as transaction ID, transaction status and the LSN of the most recent log record for the transaction. Dirty Page table: Contains an entry for each dirty page in the buffer, which includes the page ID and the LSN corresponding to the earliest update to that page. A multidatabase system is a special distributed database system where one node may be running relational database system under Unix, another may be running object-oriented system under window and so on. A transaction may run in a distributed fashion at multiple nodes. In this execution scenario the transaction commits only when all these multiple nodes agree to commit individually the part of the transaction they were executing.
Page 158
Advanced RDBMS
This commit scheme is referred to as two-phase commit (2PC). If any one of these nodes fails or cannot commit the part of the transaction, then the transaction is aborted. Each node recovers the transaction under its own recovery protocol. The discretionary access control technique of granting and revoking privileges is an all or-nothing method. The recovery manager of a DBMS must also be equipped to handle more catastrophic failures such as disk crashes Statistical database security techniques must prevent the retrieval of individual data In some cases it may be possible to infer the values of individual tuples from a sequence of statistical queries
4.6. Terminal Questions

1. ______________ helps in isolation among conflicting transactions that takes part in database management. 2. What is the purpose of concurrency control? 3. ______ table Contains an entry for each dirty page in the buffer, which includes the page ID and the LSN corresponding to the earliest update to that page. 4. List the main recovery techniques. 5. What is time Stamp? 6. What is ARIES Algorithm based on?
4.7 Supplementary Materials

[BellGris92] Bell, David and Jane Grisom, Distributed Database Systems. Workinham, England: Addison Wesley, 1992. [Bert92] Bertino, Elisa, Data Hiding and Security in Object-Oriented Databases, In proceedings Eighth International Conference on Data Engineering, 338-347, February 1992.
4.8 Assignment
Prepare assignment about Object-oriented Database Security.
4.9 Reference Books

Elmasri, R. & Navathe, S. B. (2000). Fundamentals of Database Systems. (3rd ed.). [Denn87a] Denning, Dorothy E. et al., Views for Multilevel Database Security, In IEEE Transactions on Software Engineering, vSE-13 n2, pp. 129-139, February 1987.
Page 159
Advanced RDBMS
[Her94] Herbert, Andrew, Distributing Objects, In Distributed Open Systems, F.M.T. Brazier and D. Johansen eds., pp. 123-132, Los Alamitos: IEEE Computer Press, 1994. [Inf96] Illustra Object Relational Database Management System, Informix white paper from the Illustra Document Database, 1996. [JajSan90] Jajodia, Sushil and Ravi Sandhu, Polyinstantiation Integrity in Multilevel Relations, In Proceedings IEEE Symposium on Research in Security and Privacy, pp. 104-115, 1990.
4.10 Learning Activities

An individual or groups of peoples go to library for further activities.
4.11 Keywords
1. 2. 3. 4. 5. 6. 7. Concurrency control Time Stamp Shadow Paging Immediate Update Deferred update Dirty Page Table Deadlock
Page 160

Unit - Iv

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - Iv

Uploaded by

Copyright:

Available Formats

Advanced RDBMS UNIT IV

Two-Phase Locking Techniques: The algorithm

4.3 Revision Points

4.4 Intext questions

4.6. Terminal Questions

4.7 Supplementary Materials

4.9 Reference Books

4.10 Learning Activities

You might also like