You are on page 1of 132

DBMS

UNIT- I
INTRODUCTION TO DATABASE M ANAGEMENT SYSTEM
INTRODUCTION :

A database management system is an application used to create and access databases.


Although single user database management systems exist (Microsoft Access for example) these
applications are typically used by businesses and larger organizations. Database management
systems are also widely used by Internet applications web servers and e-commerce sites.
Basic Functions of A Database Management System
A typical database management system application has several primary functions:
Maintain links to data: The database management system is responsible for establishing a
physical connection to the data in the database. Users can be on another computer, in
another building or even in another country. As long as the connection is maintained, the
database can be manipulated from any location.
Manage access to data: The database management system must control the flow of data to
ensure that records are not accidentally garbled. Limiting access to a record to one user at a
time is a key requirement. Tracking changes to data records (called transactions) is
important in the case of an error or system crash. A database management system provides
the ability to undo or "rollback" incomplete or erroneous transactions.
Maintain data access security: A database management system can limit user access to
data. The system can prevent unauthorized access as well as limiting the type of access
users may have.
It is obvious, that a basic element of information systems, which provides performance of the
specified functions, is the database. Thus, for creation of such systems is necessary modern DBMS
and application on their basis.
The basic tasks solved by DBMS application:
Effective processing complex data and data with set of the references for expression of the
relations between them.
Building of Internet-shops and distributed information systems.
Building of the virtual company office and virtual kiosks.
Storage and reproduction of graphic images, video and audio.
Creation of WEB-sites, allotted to unlimited opportunities.
PURPOSE OF DATABASE SYSTEM:

The typical file processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files.A file processing system has a number of
major disadvantages.
Mr. Y SUBBA RAYUDU M. Tech

Page 1

DBMS
Data Redundancy & Inconsistency
In file processing, every user group maintains its own files for handling its data processing
applications.
Example:
Consider the UNIVERSITY database. Here, two groups of users might be the course
registration personnel and the accounting office. The accounting office also keeps data on
registration and related billing information, whereas the registration office keeps track of
student courses and grades. Storing the same data multiple times is called data redundancy.
This redundancy leads to several problems.
Need to perform a single logical update multiple times.
Storage space is wasted.
Files that represent the same data may become inconsistent.
Data inconsistency is the various copies of the same data may no larger Agree.
Example:
One user group may enter a student's birth date erroneously as JAN-19-1984, whereas the other
user groups may enter the correct value of JAN-29-1984.
Difficulty in accessing data File processing environments do not allow needed data to
be retrieved in a convenient and efficient manner.
Example:
Suppose that one of the bank officers needs to find out the names of all customers who live
within a particular area. The bank officer has now two choices: either obtain the list of all
customers and extract the needed information manually or ask a system programmer to write
the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that
such a program is written, and that, several days later, the same officer needs to trim that list to
View of Data
DBMS is a collection of interrelated data and a set of programs that allow users to access and
modify the interrelated data. The major purpose of DBMS using DBMS is providing an
abstract view of the data is the major purpose of DBMS. Data must be retrieved efficiently
from the systems in order the system to be usable.
Data abstraction
Data abstraction is amazingly useful because it allows humans to understand and build
complex systems like databases.
A good place to start understanding the definition of data abstraction is to think about the way
the word 'abstract' is used when we talk about a long document. The abstract is the shortened,
simplified form. We often read it to get an overview before reading the entire paper. (Actually
we often read it INSTEAD of reading the paper, but that's another issue.)
The three formal abstraction layers we usually use are:
Mr. Y SUBBA RAYUDU M. Tech

Page 2

DBMS
User model: How the user describes the database
Logical model: More formal, more detail often rendered as an entity relationship
(ER) model
Physical model: More geeky detail added indexing, data types etc.
Data abstraction is simply a way of turning a complex problem into a manageable one.
DATABASE SCHEMA

Database schema skeleton structure of and it represents the logical view of entire database. It
tells about how the data is organized and how relation among them is associated. It formulates
all database constraints that would be put on data in relations, which resides in database. A
database schema defines its entities and the relationship among them. Database schema is a
descriptive detail of the database, which can be depicted by means of schema diagrams. All
these activities are done by database designer to help programmers in order to give some ease
of understanding all aspect of database.
Database schema can be divided broadly in two categories:
Physical Database Schema

This schema pertains to the actual storage of data and its form of storage like files, indices etc.
It defines the how data will be stored in secondary storage etc.
Logical Database Schema

This defines all logical constraints that need to be applied on data stored. It defines tables,
views and integrity constraints etc.
DATABASE INSTANCE

It is important that we distinguish these two terms individually. Database schema is the
skeleton of database. It is designed when database doesn't exist at all and very hard to do any
changes once the database is operational. Database schema does not contain any data or
information
Database instances, is a state of operational database with data at any given time. This is a
snapshot of database. Database instances tend to change with time. DBMS ensures that its
every instance (state) must be a valid state by keeping up to all validation, constraints and
condition that database designers has imposed or it is expected from DBMS itself.
DATA MODELS

Data model tells how the logical structure of a database is modeled. Data Models are
fundamental entities to introduce abstraction in DBMS. Data models define how data is
connected to each other and how it will be processed and stored inside the system.
The very first data model could be flat data-models where all the data used to be kept in same
plane. Because earlier data models were not so scientific they were prone to introduce lots of
duplication and update anomalies.
Mr. Y SUBBA RAYUDU M. Tech

Page 3

DBMS
Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and relationship among
them. While formulating real-world scenario into database model, ER Model creates entity set,
relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of database. ER Model is based on:
Entities and their attributes
Relationships among entities
These concepts are explained below.

Entity: An entity in ER Model is real world entity, which has some properties called attributes.
Every attribute is defined by its set of values, called domain. For example, in a school
database, a student is considered as an entity. Student has various attributes like name, age and
class etc.
Relationship:The logical association among entities is called relationship. Relationships are
mapped with entities in various ways. Mapping cardinalities define the number of association
between two entities.
Mapping cardinalities:
One to one
One to many
Many to one
Many to many
RELATIONAL M ODEL

The most popular data model in DBMS is Relational Model. It is more scientific model then
others. This model is based on first-order predicate logic and defines table as an n-ary relation.

Mr. Y SUBBA RAYUDU M. Tech

Page 4

DBMS

The main highlights of this model are:

Data is stored in tables called relations.


Relations can be normalized.
In normalized relations, values saved are atomic values.
Each row in relation contains unique value
Each column in relation contains values from a same domain

DATABASE LANGUAGES
Data Definition Language (DDL)
DDL statements are used to define the database structure or schema.

CREATE - to create objects in the database


ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the
records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object

Data Manipulation Language (DML)


DML statements are used for managing data within schema objects.

SELECT - retrieve data from the a database


INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
MERGE - UPSERT operation (insert or update)

Mr. Y SUBBA RAYUDU M. Tech

Page 5

DBMS

CALL - call a PL/SQL or Java subprogram


EXPLAIN PLAN - explain access path to data
LOCK TABLE - control concurrency

Data Control Language (DCL)

GRANT - gives user's access privileges to database


REVOKE - withdraw access privileges given with the GRANT command

Transaction Control (TCL)


TCL statements are used to manage the changes made by DML statements. It allows statements
to be grouped together into logical transactions.

COMMIT - save work done


SAVEPOINT - identify a point in a transaction to which you can later roll back
ROLLBACK - restore database to original since the last COMMIT
SET TRANSACTION - Change transaction options like isolation level and what
rollback segment to use

TRANSACTION MANAGEMENT
ACID Properties

A transaction may contain several low level tasks and further a transaction is very small unit of
any program. A transaction in a database system must maintain some properties in order to
ensure the accuracy of its completeness and data integrity. These properties are refer to as
ACID properties and are mentioned below:

Atomicity: Though a transaction involves several low level operations but this property
states that a transaction must be treated as an atomic unit, that is, either all of its
operations are executed or none. There must be no state in database where the
transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.

Consistency: This property states that after the transaction is finished, its database must
remain in a consistent state. There must not be any possibility that some data is
incorrectly affected by the execution of transaction. If the database was in a consistent
state before the execution of the transaction, it must remain in consistent state after the
execution of the transaction.

Durability: This property states that in any case all updates made on the database will
persist even if the system fails and restarts. If a transaction writes or updates some data
in database and commits that data will always be there in the database. If the
transaction commits but data is not written on the disk and the system fails, that data
will be updated once the system comes up.

Mr. Y SUBBA RAYUDU M. Tech

Page 6

DBMS

Isolation: In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.

Serializability

When more than one transaction is executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.

Schedule: A chronological execution sequence of transaction is called schedule. A


schedule can have many transactions in it, each comprising of number of
instructions/tasks.

Serial Schedule: A schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle then next
transaction is executed. Transactions are ordered one after other. This type of schedule
is called serial schedule as transactions are executed in a serial manner.

In a multi-transaction environment, serial schedules are considered as benchmark. The


execution sequence of instruction in a transaction cannot be changed but two transactions can
have their instruction executed in random fashion. This execution does no harm if two
transactions are mutually independent and working on different segment of data but in case
these two transactions are working on same data, results may vary. This ever-varying result
may cause the database in an inconsistent state.
To resolve the problem, we allow parallel execution of transaction schedule if transactions in it
are either serializable or have some equivalence relation between or among transactions.
Equivalence schedules

Schedules can equivalence of the following types:

Result Equivalence:If two schedules produce same results after execution, are said to
be result equivalent. They may yield same result for some value and may yield different
results for anothervalues. That's why this equivalence is not generally considered
significant.

View Equivalence:Two schedules are view equivalence if transactions in both


schedules perform similar actions in similar manner.
Example:
If T reads initial data in S1 then T also reads initial data in S2
If T reads value written by J in S1 then T also reads value written by J in S2
If T performs final write on data value in S1 then T also performs final write on data
value in S2

Mr. Y SUBBA RAYUDU M. Tech

Page 7

DBMS

Conflict Equivalence:Two operations are said to be conflicting if they have the


following properties:
o Both belong to separate transactions
o Both accesses the same data item
o At least one of them is "write" operation
Two schedules have more than one transactions with conflicting operations are said to
be conflict equivalent if and only if:
o Both schedules contain same set of Transactions
o The order of conflicting pairs of operation is maintained in both schedules

View equivalent schedules are view serializable and conflict equivalent schedules are conflict
serializable. All conflict serializable schedules are view serializable too.
States of Transactions

A transaction in a database can be in one of the following state:

Active: In this state the transaction is being executed. This is the initial state of every
transaction.

Partially Committed: When a transaction executes its final operation, it is said to be in


this state. After execution of all operations, the database system performs some checks
e.g. the consistency state of database after applying output of transaction onto the
database.

Failed: If any check made by database recovery system fails, the transaction is said to
be in failed state, from where it can no longer proceed further.

Aborted: If any of checks fails and transaction reached in Failed state, the recovery
manager rolls back all its write operation on the database to make database in the state

Mr. Y SUBBA RAYUDU M. Tech

Page 8

DBMS
where it was prior to start of execution of transaction. Transactions in this state are
called aborted. Database recovery module can select one of the two operations after a
transaction aborts:
o
o

Re-start the transaction


Kill the transaction

Committed: If transaction executes all its operations successfully it is said to be


committed. All its effects are now permanently made on database system.

DBMS S TORAGE SYSTEM


Databases are stored in file formats, which contain records. At physical level, actual data is
stored in electromagnetic format on some device capable of storing it for a longer amount of
time. These storage devices can be broadly categorized in three types:

Primary Storage

The memory storage, which is directly accessible by the CPU, comes under this category.
CPU's internal memory (registers), fast memory (cache) and main memory (RAM) are directly
accessible to CPU as they all are placed on the motherboard or CPU chipset. This storage is
typically very small, ultra fast and volatile. This storage needs continuous power supply in
order to maintain its state, i.e. in case of power failure all data are lost.

Secondary Storage

The need to store data for longer amount of time and to retain it even after the power supply is
interrupted gave birth to secondary data storage. All memory devices, which are not part of
CPU chipset or motherboard comes under this category. Broadly, magnetic disks, all optical
disks (DVD, CD etc.), flash drives and magnetic tapes are not directly accessible by the CPU.

Mr. Y SUBBA RAYUDU M. Tech

Page 9

DBMS
Hard disk drives, which contain the operating system and generally not removed from the
computers are, considered secondary storage and all other are called tertiary storage.
Tertiary Storage

Third level in memory hierarchy is called tertiary storage. This is used to store huge amount of
data. Because this storage is external to the computer system, it is the slowest in speed. These
storage devices are mostly used to backup the entire system. Optical disk and magnetic tapes
are widely used storage devices as tertiary storage.
DATA QUERYING
Queries are the primary mechanism for retrieving information from a database and consist of
questions presented to the database in a predefined format. Many database management
systems use the Structured Query Language (SQL) standard query format.
Choosing parameters from a menu: In this method, thedatabase system presents a list of
parameters from which you can choose. This is perhaps the easiest way to pose a query
because the menus guide you, but it is also the least flexible.
Query by example (QBE): In this method, the systempresents a blank record and lets
you specify the fields and values that define the query.
Query language: Many database systems require you to make requests for information
in the form of a stylized query that must be written in a special query language. This is
the most complex method because it forces you to learn a specialized language, but it is
also the most powerful.
DATABASE ARCHITECTURE

The design of a Database Management System highly depends on its architecture. It can be
centralized or decentralized or hierarchical. DBMS architecture can be seen as single tier or
multi-tier. n-tier architecture divides the whole system into related but independent n modules,
which can be independently modified, altered, changed or replaced.

Mr. Y SUBBA RAYUDU M. Tech

Page 10

DBMS
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be done on DBMS itself. It does not provide handy tools
for end users and preferably database designer and programmers use single tier architecture.
If the architecture of DBMS is 2-tier then must have some application, which uses the DBMS.
Programmers use 2-tier architecture where they access DBMS by means of application. Here
application tier is entirely independent of database in term of operation, design and
programming.
3-tier architecture

Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier from
each other on basis of users. It is described as follows:

Database (Data) Tier: At this tier, only database resides. Database along with its query
processing languages sits in layer-3 of 3-tier architecture. It also contains all relations and their
constraints.
Application (Middle) Tier: At this tier the application server and program, which access
database, resides. For a user this application tier works as abstracted view of database. Users
are unaware of any existence of database beyond application. For database-tier, application tier
Mr. Y SUBBA RAYUDU M. Tech

Page 11

DBMS
is the user of it. Database tier is not aware of any other user beyond application tier. This tier
works as mediator between the two.
User (Presentation) Tier: An end user sits on this tier. From users aspect this tier is
everything. He/she doesn't know about any existence or form of database beyond this layer. At
this layer multiple views of database can be provided by the application. All views are
generated by applications, which reside in application tier.
Multiple tier database architecture is highly modifiable as almost all its components are
independent and can be changed independently.
DATA BASE USERS
DBMS is used by various users for various purposes. Some may involve in retrieving data and
some may involve in backing it up. Some of them are described as follows:
Administrators: A bunch of users maintain the DBMS and are responsible for administrating
the database. They are responsible to look after its usage and by whom it should be used. They
create users access and apply limitation to maintain isolation and force security. Administrators
also look after DBMS resources like system license, software application and tools required
and other hardware related maintenance.
Designer: This is the group of people who actually works on designing part of database. The
actual database is started with requirement analysis followed by a good designing process.
They people keep a close watch on what data should be kept and in what format. They identify
and design the whole set of entities, relations, constraints and views.
End Users: This group contains the persons who actually take advantage of database system.
End users can be just viewers who pay attention to the logs or market rates or end users can be
as sophisticated as business analysts who take the most of it.
Database Administrator [DBA]

Centralized control of the database is exerted by a person or group of persons under the
supervision of a high level administrator. This person or group is referred to as the database
administrator (DBA). They are the users who are most familiar with the database and are
responsible for creating, modifying, and maintaining its three levels. Database Administrator is
responsible to manage the DBMSs use and ensure that the database is functioning properly.
DBA administers the three levels of database and consultation with the overall user community,
sets up the definition of the global view of the various users and applications and is responsible
the definition and implementation of the internal level, including the storage structure and
access methods to be used for the optimum performance of the DBMS. DBA is responsible for

Mr. Y SUBBA RAYUDU M. Tech

Page 12

DBMS
granting permission to the users of the database and stores the profile of each user in the
database.
History of Database System

Although various rudimentary DBMSs had been in use prior to IBM Corp.'s release of
Information Management System (IMS) in 1966, IMS was the first commercially available
DBMS. IMS was considered a hierarchical database, in which standardized data records were
organized within other standardized data records, creating a hierarchy of information about a
single entry. In the late 1960s, firms like Honeywell Corp. and General Electric Corp.
developed DBMSs based on a network data model, but the next major database management
breakthrough came in 1970 when a research scientist at IBM first outlined his theory for
relational databases. Six years later, IBM completed a prototype for a relational DBMS.
In 1977, computer programmers Larry Ellison and Robert Miner co-founded Oracle Systems
Corp. Their combined experience designing specialized database programs for governmental
organizations landed the partners a $50,000 contract from the Central Intelligence Agency
(CIA) to develop a customized database program. While working on the CIA project, Ellison
and Miner became interested in IBM's efforts to develop a relational database, which involved
Structured Query Language (SQL). Recognizing that SQL would allow computer users to
retrieve data from a variety of sources and sensing that SQL would become a database industry
standard, Ellison and Miner began working on developing a program similar to the relational
DBMS being developed by IBM. In 1978, Oracle released its own relational DBMS, the
world's first relational database management system (RDBMS) using SQL. Oracle began
shipping its RDBMS the following year, nearly two years before IBM shipped its first version
of DB2, which would become a leading RDBMS competing with the database management
applications of industry giants like Microsoft Corp. and Oracle. Relational databases eventually
outpaced all other database types, mainly because they allowed for highly complex queries and
could support various tools which enhanced their usefulness.
In 1983, Oracle developed the first portable RDBMS, which allowed firms to run their DBMS
on various machines including mainframes, workstations, and personal computers. Soon
thereafter, the firm also launched a distributed DBMS, based on SQL-Star software, which
granted users the same kind of access to data stored on a network they would have if the data
were housed in a single computer. By the end of the decade, Oracle had grown into the world's
leading enterprise DBMS provider with more than $100 million in sales.
It wasn't long before DBMSs were developed for use on individual PCs. In 1993, Microsoft
Corp. created an application called Access. The program competed with FileMaker Inc.'s
FileMaker Pro, a database application initially designed for Macintosh machines.

INTRODUCTION

TO

DATABASE DESIGN

Major Steps in Database Design

Mr. Y SUBBA RAYUDU M. Tech

Page 13

DBMS

Requirements Analysis: Talk to the potential users! Understand what data is to be


stored, and what operations and requirements are desired.

Conceptual Database Design: Develop a high-level description of the data and


constraints (we will use the ER data model)

Logical Database Design: Convert the conceptual model to a schema in the chosen
data model of the DBMS. For a relational database, this means converting the
conceptual to a relational schema (logical schema).

Schema Refinement: Look for potential problems in the original choice of schema and
try to redesign.

Physical Database Design: Direct the DBMS into choice of underlying data layout
(e.g., indexes and clustering) in hopes of optimizing the performance.

Applications and Security Design: How will the underlying database interact with
surrounding applications.

Entity-Relationship Data Model (ER)

Entity: An entity is a real-world object or concept which is distinguishable from other objects.
It may be something tangible, such as a particular student or building. It may also be somewhat
more conceptual, such as CS A-341, or an email address.
Attributes: These are used to describe a particular entity (e.g. name, SS#, height).
Domain: Each attribute comes from a specified domain (e.g., name may be a 20 character
string; SS# is a nine-digit integer)
Entity set: a collection of similar entities (i.e., those which are distinguished using the same set
of attributes. As an example, I may be an entity, whereas Faculty might be an entity set to
which I belong. Note that entity sets need not be disjoint. I may also be a member of Staff or
of Softball Players.
Key: a minimal set of attributes for an entity set, such that each entity in the set can be
uniquely identified. In some cases, there may be a single attribute (such as SS#) which serves
as a key, but in some models you might need multiple attributes as a key ("Bob from
Accounting"). There may be several possible candidate keys. We will generally designate one
such key as the primary key.
ER diagrams:

Mr. Y SUBBA RAYUDU M. Tech

Page 14

DBMS
It is often helpful to visualize an ER model via a diagram. There are many variant conventions
for such diagrams; we will adapt the one used in the text.
Diagram conventions

An entity set is drawn as a rectangle.

Attributes are drawn as ovals.

Attributes which belong to the primary key are underlined.


Example:

BEYOND ER DESIGN
Objectives

Steps for designing a Database.


Entities and Attributes
Relational Database Keys (Primary Keys, Foreign Keys, Candidate Keys)
Define the attributes of an entities, keys and relationships between entities and
attributes

ER Model
Entity relationship model defines the conceptual view of database. It works around real world
entity and association among them. At view level, ER model is considered well for designing
databases.
Entity

A real-world thing either animate or inanimate that can be easily identifiable and
distinguishable. For example, in a school database, student, teachers, class and course offered
can be considered as entities. All entities have some attributes or properties that give them their
identity.
An entity set is a collection of similar types of entities. Entity set may contain entities with
attribute sharing similar values. For example, Students set may contain all the student of a

Mr. Y SUBBA RAYUDU M. Tech

Page 15

DBMS
school; likewise Teachers set may contain all the teachers of school from all faculties. Entities
sets need not to be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute
Simple attributes are atomic values, which cannot be divided further. For example, student's
phone-number is an atomic value of 10 digits.
Composite attribute
Composite attributes are made of more than one simple attribute. For example, a student's
complete name may have first_name and last_name.
Derived attribute
Derived attributes are attributes, which do not exist physical in the database, but there values
are derived from other attributes presented in the database. For example, average_salary in a
department should be saved in database instead it can be derived. For another example, age can
be derived from data_of_birth.
Single-valued attribute
Single valued attributes contain on single value. For example: Social_Security_Number.
Multi-value attribute
Multi-value attribute may contain more than one values. For example, a person can have more
than one phone numbers, email_addresses etc.
These attribute types can come together in a way like:
o
o
o
o

simple single-valued attributes


simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes

Entity-Sets & Keys

Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
Example: roll_number of a student makes her/him identifiable among students.
Mr. Y SUBBA RAYUDU M. Tech

Page 16

DBMS
o Super Key: Set of attributes (one or more) that collectively identifies an entity in an
entity set.
o Candidate Key: Minimal super key is called candidate key that is, supers keys for
which no proper subset are a superkey. An entity set may have more than one candidate
key.
o Primary Key: This is one of the candidate key chosen by the database designer to
uniquely identify the entity set.
Relationship

The association among entities is called relationship. For example, employee entity has relation
works_at with department. Another example is for student who enrolls in some course. Here,
Works_at and Enrolls are called relationship.
Relationship Set

Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship

The number of participating entities in an relationship defines the degree of the relationship.
o Binary = degree 2
o Ternary = degree 3
o n-ary = degree
Mapping Cardinalities

Cardinality defines the number of entities in one entity set which can be associated to the
number of entities of other set via relationship set.
o

One-to-one: one entity from entity set A can be associated with at most one entity of
entity set B and vice versa.

One-to-many: One entity from entity set A can be associated with more than one
entities of entity set B but from entity set B one entity can be associated with at most
one entity.

Mr. Y SUBBA RAYUDU M. Tech

Page 17

DBMS

Many-to-one: More than one entities from entity set A can be associated with at most
one entity of entity set B but one entity from entity set B can be associated with more
than one entity from entity set A.

Many-to-many: one entity from A can be associated with more than one entity from B
and vice versa.

Additional Features of ER Diagram

Mr. Y SUBBA RAYUDU M. Tech

Page 18

DBMS

Ternary Relationship Set


A relationship set need not be an association of precisely two entities; it can involve three or
more when applicable. Here is another example from the text, in which a store has multiple
locations.

Using several entities from same entity set


A relationship might associate several entities from the same underlying entity set, such
as in the following example, Reports_To. In this case, an additional role indicator (e.g.,
"supervisor") is used in the diagram to further distinguish the two similar entities.

Specifying additional constraints:


If you took a 'snapshot' of the relationship set at some instant in time, we will call this
an instance..
Mr. Y SUBBA RAYUDU M. Tech

Page 19

DBMS

A (binary) relationship set can further be classified as either


o many-to-many
o one-to-many
o one-to-one
based on whether an individual entity from one of the underlying sets is allowed to be in more
than one such relationship at a time. The above figure contains a many-to-many relationship, as
departments may employ more than one person at a time, and an individual person may be
employed by more than one department.
Sometimes, an additional constraint exists for a given relationship set, that any entity from one
of the associated sets appears in at most one such relationship. For example, consider a
relationship set "Manages" which associates departments with employees. If a department
cannot have more than one manager, this is an example of a one-to-many relationship set (it
may be that an individual manages multiple departments).
This type of constraint is called a key constraint. It is represented in the ER diagrams by
drawing an arrow from an entity set E to a relationship set R when each entity in an instance of
E appears in at most one relationship in (a corresponding instance of) R.

An instance of this relationship is given in Figure 2.7.

Mr. Y SUBBA RAYUDU M. Tech

Page 20

DBMS

If both entity sets of a relationship set have key constraints, we would call this a "one-to-one"
relationship set. In general, note that key constraints can apply to relationships between more
than two entities, as in the following example.

An instance of this relationship:

Participation Constraints
Mr. Y SUBBA RAYUDU M. Tech

Page 21

DBMS
Recall that a key constraint requires that each entity of a set be required to participate in at
most one relationship. Dual to this, we may ask whether each entity of a set be required to
participate in at least one relationship.
If this is required, we call this a total participation constraint; otherwise the participation
is partial. In our ER diagrams, we will represent a total participation constraint by using
a thick line.

Weak Entities
There are times you might wish to define an entity set even though its attributes do not
formally contain a key (recall the definition for a key).
Usually, this is the case only because the information represented in such an entity set is only
interesting when combined through an identifying relationship set with another entity set we
call theidentifying owner.
We will call such a set a weak entity set, and insist on the following:
The weak entity set must exhibit a key constraint with respect to the identifying
relationship set.
The weak entity set must have total participation in the identifying relationship set.
Together, this assures us that we can uniquely identify each entity from the weak set by
considering the primary key of its identifying owner together with a partial key from the weak
entity.
In our ER diagrams, we will represent a weak entity set by outlining the entity and the
identifying relationship set with dark lines. The required key constraint and total participation
are diagrammed with our existing conventions. We underline the partial key with a dotted line.

Mr. Y SUBBA RAYUDU M. Tech

Page 22

DBMS

Class Hierarchies
As with object-oriented programming, it is often convenient to classify an entity sets as a
subclass of another. In this case, the child entity set inherits the attributes of the parent entity
set. We will denote this scenario using an "ISA" triangle, as in the following ER diagram:

Furthermore, we can impose additional constraints on such subclassing. By default, we will


assume that two subclasses of an entity set are disjoint. However, if we wish to allow an entity
to lie in more than one such subclass, we will specify an overlap constraint. (e.g.
"Contract_Emps OVERLAPS Senior_Emps")
Dually, we can ask whether every entity in a superclass be required to lie in (at least) one
subclass. By default we will not assume not, but we can specify a covering constraint if
desired. (e.g. "Motorboats AND Cards COVER Motor_Vehicles")
Aggregation
Thus far, we have defined relationships to be associations between two or more entities.
However, it sometimes seems desirable to define a new relationship which associates some
entity with some other existing relationship. To do this, we will introduce a new feature to our
Mr. Y SUBBA RAYUDU M. Tech

Page 23

DBMS
model called aggregation. We identifying an existing relationship set by enclosing it in a larger
dashed box, and then we will allow it to participate in another relationship set.
A motivating example follows:

Conceptual Design with the ER Model


It is most important to recognize that there is more than one way to model a given situation.
Our next goal is to start to compare the pros and cons of common choices.
Should a concept be modeled as an entity or an attribute?
Consider the scenario, if we want to add address information to the Employees entity set? We
might choose to add a single attribute address to the entity set. Alternatively, we could
introduce a new entity set, Addresses and then a relationship associating employees with
addresses. What are the pros and cons?
Adding a new entity set is more complex model. It should only be done when there is need for
the complexity. For example, if some employees have multiple address to be associated, then
the more complex model is needed. Also, representing addresses as a separate entity would
allow a further breakdown, for example by zip code or city.
What if we wanted to modify the Works_In relationship to have both a start and end date,
rather than just a start date. We could add one new attribute for the end date; alternatively, we
could create a new entity set Duration which represents intervals, and then
Mr. Y SUBBA RAYUDU M. Tech

Page 24

DBMS
the Works_In relationship can be made ternary (associating an employee, a department and an
interval). What are the pros and cons?
If the duration is described through descriptive attributes, only a single such duration can be
modeled. That is, we could not express an employment history involving someone who left the
department yet later returned.
Should a concept be modeled as an entity or a relationship?
Consider a situation in which a manager controls several departments. Let's presume that a
company budgets a certain amount (budget) for each department. Yet it also wants managers to
have access to some discretionary budget (dbudget). There are two corporate models. A
discretionary budget may be created for each individual department; alternatively, there may be
a discretionary budget for each manager, to be used as she desires.
Which scenario is represented by the following ER diagram? If you want the alternate
interpretation, how would you adjust the model?

Should we use binary or ternary relationships?


Consider the following ER diagram, representing insurance policies owned by employees at a
company. Each employee can own several polices, each policy can be owned by several
employees, and each dependent can be covered by several policies.

What if we wish to model the following additional requirements:


A policy cannot be owned jointly by two or more employees.
Mr. Y SUBBA RAYUDU M. Tech

Page 25

DBMS
Every policy must be owned by some employee.
Dependents is a weak entity set, and each dependent entity is uniquely identified by
taking pname in conjunction with the policyid of a policy entity (which, intuitively, covers the
given dependent).
The best way to model this is to switch away from the ternary relationship set, and instead use
two distinct binary relationship sets.

Should we use aggregation?


Consider again the following ER diagram:

If we did not need the until or since attributes. In this case, we could model the identical setting
using the following ternary relationship:

Mr. Y SUBBA RAYUDU M. Tech

Page 26

DBMS

Let's compare these two models. What if we wanted to add an additional constraint to
each, that each sponsorship (of a project by a department) be monitored by at most one
employee. Can you add this constraint to either of the above models.
RELATION DATA MODEL
Relational data model is the primary data model, which is used widely around the world for
data storage and processing. This model is simple and have all the properties and capabilities
required to process data with storage efficiency.
Concepts
Tables: In relation data model, relations are saved in the format of Tables. This format stores
the relation among entities. A table has rows and columns, where rows represent records and
columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance: A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their names.
Relation key: Each row has one or more attributes which can identify the row in the relation
(table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as attribute
domain.

Mr. Y SUBBA RAYUDU M. Tech

Page 27

DBMS
Relational Model Constraints

Integrity Constraints: An integrity constraint (IC) is a condition specified on a


database schema and restricts the data that can be stored in an instance of the database.
If a database instance satisfies all the integrity constraints specifies on the database
schema, it is a legal instance. A DBMS permits only legal instances to be stored in the
database. Many kinds of integrity constraints can be specified in the relational model:

Domain Constraints:A relation schema specifies the domain of each field in the
relation instance. These domain constraints in the schema specify the condition that
each instance of the relation has to satisfy: The values that appear in a column must be
drawn from the domain associated with that column. Thus, the domain of a field is
essentially the type of that field.

Key Constraints
A Key Constraint is a statement that a certain minimal subset of the fields of a relation is a
unique identifier for a tuple.

Super Key:An attribute, or set of attributes, that uniquely identifies a tuple within a
relation.However, a super key may contain additional attributes that are not necessary
for a unique identification.
Example: The customer_id of the relation customer is sufficient to distinguish one tuple
from other. Thus,customer_id is a super key. Similarly, the combination
of customer_id and customer_name is a super key for the relation customer. Here
the customer_name is not a super key, because several people may have the same
name. We are often interested in super keys for which no proper subset is a super key.
Such minimal super keys are called candidate keys.

Candidate Key:A super key such that no proper subset is a super key within the
relation.There are two parts of the candidate key definition:
o Two distinct tuples in a legal instance cannot have identical values in all the
fields of a key
o No subset of the set of fields in a candidate key is a unique identifier for a
tuple.A relation may have several candidate keys.
Example: The combination of customer_name and customer_street is sufficient to
distinguish the members of the customer relation. Then both, {customer_id} and
{customer_name,
customer_street}
are
candidate
keys.
Although customer_id and customer_name together can distinguish customer tuples,
their combination does not form a candidate key, since the customer_id alone is a
candidate key.

Mr. Y SUBBA RAYUDU M. Tech

Page 28

DBMS

Primary Key:The candidate key that is selected to identify tuples uniquely within the
relation. Out of all the available candidate keys, a database designer can identify
a primary key. The candidate keys that are not selected as the primary key are called
as alternate keys.
Features of the primary key:
o Primary key will not allow duplicate values.
o Primary key will not allow null values.
o Only one primary key is allowed per table.
Example: For the student relation, we can choose student_id as the primary key.

Foreign Key:Foreign keys represent the relationships between tables. A foreign key is
a column (or a group of columns) whose values are derived from the primary key of
some other table.The table in which foreign key is defined is called a Foreign
table or Details table. The table that defines the primary key and is referenced by
the foreign key is called the Primary table or Master table.
Features of foreign key:
o Records cannot be inserted into a detail table if corresponding records in the
master table do not exist.
o Records of the master table cannot be deleted or updated if corresponding
records in the detail table actually exist.

General Constraints
Domain, primary key, and foreign key constraints are considered to be a fundamental part of
the relational data model. Sometimes, however, it is necessary to specify more general
constraints.
Example: we may require that student ages be within a certain range of values. Giving such an
IC, the DBMS rejects inserts and updates that violate the constraint.
Current database systems support such general constraints in the form of table
constraints andassertions. Table constraints are associated with a single table and checked
whenever that table is modified. In contrast, assertions involve several tables and are checked
whenever any of these tables is modified.
Example: for table constraint, which ensures always the salary of an employee, is above 1000:
CREATE
TABLE
employee
(eid integer, ename varchar2(20), salary real,
CHECK(salary>1000));
Example: for assertion, which enforce a constraint that the number of boats plus the number of
sailors should be less than 100.

Mr. Y SUBBA RAYUDU M. Tech

Page 29

DBMS
CREATE ASSERTION smallClub CHECK ((SELECT COUNT (S.sid) FROM Sailors S) +
(SELECT COUNT (B.bid) FROM Boats B) < 100);
Referential/Enforcing Integrity Constraints
This integrity constraints works on the concept of Foreign Key. A key attribute of a relation can
be referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to an key attribute of a different or
same relation, that key element must exists.
Querying Relational Data:

A Relational Database Overview


A database is a means of storing information in such a way that information can be retrieved
from it. In simplest terms, a relational database is one that presents information in tables with
rows and columns. A table is referred to as a relation in the sense that it is a collection of
objects of the same type (rows). Data in a table can be related according to common keys or
concepts, and the ability to retrieve related data from a table is the basis for the term relational
database. A Database Management System (DBMS) handles the way data is stored,
maintained, and retrieved. In the case of a relational database, a Relational Database
Management System (RDBMS) performs these tasks. DBMS as used in this book is a general
term that includesRDBMS.
Logical Database Design
A logical data model is a fully-attributed data model that is independent of DBMS, technology,
data storage or organizational constraints. It typically describes data requirements from the
business point of view. While common data modeling techniques use a relational model
notation, there is no requirement that resulting data implementations must be created using
relational technologies.
Common characteristics of a logical data model:
Typically describes data requirements for a single project or major subject area.
May be integrated with other logical data models via a repository of shared entities
Typically contains 100-1000 entities, although these numbers are highly variable
depending on the scope of the data model.
Mr. Y SUBBA RAYUDU M. Tech

Page 30

DBMS

Contains relationships between entities that address cardinality and nullability


(optionality) of the relationships.
Designed and developed to be independent of DBMS, data storage locations or
technologies. In fact, it may address digital and non-digital concepts.
Data attributes will typically have datatypes with precisions and lengths assigned.
Data attributes will have nullability (optionality) assigned.
Entities and attributes will have definitions.
All kinds of other meta data may be included (retention rules, privacy indicators,
volumetrics, data lineage, etc.) In fact, the diagram of a logical data model may show
only a tiny percentage of the meta data contained within the model.

A logical data model will normally be derived from and or linked back to objects in a
conceptual data model.

INTRODUCTION TO VIEWS
A view is virtual table in the database defined by a query. A view does not exist in the database
as a stored set of data values.To reduces redundant data to the minimum possible, oracle allows
the create of an object called a view.
The reasons for creating view sale:
When data security is required.
When data redundancy is to be kept to the minimum while maintaining datasecurity.
There are 3 types of views
Horizontal view restricts a users access to selected rows of a table.
Vertical view restricts a users access to select columns of a table.
A joined view draws its data from two or three different tables and presents the query
results as a single virtual table. Once the view is defined, one can use a single table query
against the view for the requests that would otherwise each require a two or three table join.
Advantages of views
Security: security is provided to the data base to the user to a specific no. of rows of a
table.
Query simplicity: by using joined views data can be accessed from different tables.
Data integrity: if data is accessed and entered through a view, the DBMS can
automatically check the data to ensure that it meets specified integrity constraints.

Disadvantages of views

Mr. Y SUBBA RAYUDU M. Tech

Page 31

DBMS

Performance: The DBMS the query against the view into queries against the underlying
source table. If a table is defined by a multi table query, then even a simple query against a
view becomes a complicated join, and it may take a long time to complete. This is
reference to insert, delete and update operations

Update restrictions: when a user tries to update rows of a view, the DBMS must translate
the request into an update into an update on rows of the underlying source table. This is
possible for simple views, but more complicated views cannot be updated.

Destroying /Altering Tables and views:


The ALTER TABLE statement changes a Base table's definition. The required syntax for
the ALTER TABLE statement is:
ALTER TABLE <Table name><alter table action>

<alter table action> ::=


ADD [ COLUMN ] <Column definition> |
ALTER [ COLUMN ] <Column name> SET DEFAULT default value |
ALTER [ COLUMN ] <Column name> DROP DEFAULT |
ALTER [ COLUMN ] <Column name> ADD SCOPE <Table name list> |
ALTER [ COLUMN ] <Column name> DROP SCOPE {RESTRICT | CASCADE} |
DROP [ COLUMN ] <Column name> {RESTRICT | CASCADE} |
ADD <Table Constraint> |
DROP CONSTRAINT <Constraint name> {RESTRICT | CASCADE}

Removing constraints
ALTER TABLE enables you to remove column or table constraints. For example, to remove
the unique constraint you just created, use
ALTER TABLE SALESMAN
DROP CONSTRAINT uk_salesmancode;

UNIT - II
Mr. Y SUBBA RAYUDU M. Tech

Page 32

DBMS
RELATIONAL ALGEBRA

Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator can
be either unary or binary. They accept relations as their input and yields relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are
also considered relations.
Fundamental operations of Relational algebra:
Select
Project
Union
Set different
Cartesian product
Rename
These are defined briefly as follows:
Select Operation ()
Selects tuples that satisfy the given predicate from a relation.
Notation p(r)
Where p stands for selection predicate and r stands for relation. p is prepositional logic
formulae which may use connectors like and, or and not. These terms may use relational
operators like: =, , , < , >, .
Examples:
subject="database"(Books)
Output : Selects tuples from books where subject is 'database'.
subject="database" and price="450"(Books)
Output : Selects tuples from books where subject is 'database' and 'price' is 450.
subject="database" and price < "450" or year > "2010"(Books)
Output : Selects tuples from books where subject is 'database' and 'price' is 450 or the
publication year is greater than 2010, that is published after 2010.
Project Operation ()

Projects column(s) that satisfy given predicate.


Notation: A1, A2, An (r)
Where a1, a2 , an are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
Mr. Y SUBBA RAYUDU M. Tech

Page 33

DBMS
Examples:
subject, author (Books)
Selects and projects columns named as subject and author from relation Books.

Union Operation ()
Union operation performs binary union between two given relations and is defined as:
r s = { t | t r or t s}
Notion: r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold:
r, s must have same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
Examples:
author (Books) author (Articles)
Output : Projects the name of author who has either written a book or an article or
both.
Set Difference ( )
The result of set difference operation is tuples which present in one relation but are not in the
second relation.
Notation: r s
Finds all tuples that are present in r but not s.
Example:
author (Books) author (Articles)
Output: Results the name of authors who has written books but not articles.
Cartesian Product ()
Combines information of two different relations into one.
Notation: r s
Where r and s are relations and there output will be defined as:
r s = { q t | q r and t s}
author = 'tutorialspoint'(Books Articles)
Output : yields a relation as result which shows all books and articles written by
tutorialspoint.
Rename operation ( )
Mr. Y SUBBA RAYUDU M. Tech

Page 34

DBMS
Results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. rename operation is denoted with small greek letter
rho
Notation: x (E)
Where the result of expression E is saved with name of x.
Additional operations are:
Set intersection
Assignment
Natural join

JOIN Operator
JOIN is used to combine related tuples from two relations:
In its simplest form the JOIN operator is just the cross product of the two relations.
As the join becomes more complex, tuples are removed within the cross product to
make the result of the join more meaningful.
JOIN allows you to evaluate a join condition between the attributes of the relations on
which the join is undertaken.
The notation used is
R JOINjoin condition S
Example

Natural Join
Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such
joins result in two attributes in the resulting relation having exactly the same value. A `natural
join' will remove the duplicate attribute(s).
In most systems a natural join will require that the attributes have the same name to
identify the attribute(s) to be used in the join. This may require a renaming mechanism.
Mr. Y SUBBA RAYUDU M. Tech

Page 35

DBMS
If you do use natural joins make sure that the relations do not have two attributes with
the same name by accident.
Outer Joins
Notice that much of the data is lost when applying a join to two relations. In some cases this
lost data might hold useful information. An outer join retains the information that would have
been lost from the tables, replacing missing data with nulls.
There are three forms of the outer join, depending on which data is to be kept.
LEFT OUTER JOIN - keep data from the left-hand table
RIGHT OUTER JOIN - keep data from the right-hand table
FULL OUTER JOIN - keep data from both tables

OUTER JOIN example 1

Figure : OUTER JOIN


(left/right)

OUTER JOIN example 2

Figure : OUTER JOIN


(full)

Mr. Y SUBBA RAYUDU M. Tech

Page 36

DBMS
Division
As the name of this operation implies, it involves dividing one relation by another. Division is
in principle a partitioning operation. Thus, 6 2 can be paraphrased as partitioning a single
group of 6 into a number of groups of 2 - in this case, 3 groups of 2. The basic terminology
used in arithmetic will be used here as well. Thus in an expression like x y, x is the dividend
and y the divisor. Division does not always yield whole groups of the divisor, eg. 7 2 gives 3
groups of 2 and a remainder group of 1. Relational division too can leave remainders but, much
like integer division, we ignore remainders and focus only on constructing whole groups of the
divisor.

Relational Calculus
In contrast with Relational Algebra, Relational Calculus is non-procedural query language, that
is, it tells what to do but never explains the way, how to do it.
Relational calculus exists in two forms:
Tuple relational calculus (TRC) : Filtering variable ranges over tuples
Notation: { T | Condition }
Returns all tuples T that satisfies condition.
Examples:
{ T.name | Author(T) AND T.article = 'database' }
Output: returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified also. We can use Existential ( )and Universal Quantifiers (
).
{ R| T Authors(T.article='database' AND R.name=T.name)}
Output : the query will yield the same result as the previous one.
Domain relational calculus (DRC) : In DRC the filtering variable uses domain of attributes
instead of entire tuple values (as done in TRC, mentioned above).
Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
where a1, a2 are attributes and P stands for formulae built by inner attributes.

Mr. Y SUBBA RAYUDU M. Tech

Page 37

DBMS
Examples:
{< article, page, subject > | TutorialsPoint subject = 'database'}
Output: Yields Article, Page and Subject from relation TutorialsPoint where Subject
is database.
Just like TRC, DRC also can be written using existential and universal quantifiers. DRC also
involves relational operators.Expression power of Tuple relation calculus and Domain relation
calculus is equivalent to Relational Algebra.

FORMS OF BASIC SQL QUERY:


Introduction
SQL stands for Structured Query Language
SQL lets you access and manipulate databases
SQL is an ANSI (American National Standards Institute) standard
What does SQL do
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert records in a database
SQL can update records in a database
SQL can delete records from a database
SQL can create new databases
SQL can create new tables in a database
SQL can create stored procedures in a database
SQL can create views in a database
SQL can set permissions on tables, procedures, and views
Using SQL in Your Web Site
To build a web site that shows data from a database, you will need:
Using SQL in your Web site
An RDBMS database program (i.e. MS Access, SQL Server, MySQL)
To use a server-side scripting language, like PHP or ASP
To use SQL to get the data you want
To use HTML / CSS

RDBMS
RDBMS stands for Relational Database Management System.
RDBMS is the basis for SQL, and for all modern database systems such as MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
The data in RDBMS is stored in database objects called tables.
A table is a collection of related data entries and it consists of columns and rows.
Mr. Y SUBBA RAYUDU M. Tech

Page 38

DBMS

A Relational database management system (RDBMS) is introduced by E. F. Codd.

Table
The data in RDBMS is stored in database objects called tables. The table is a collection of
related data entries and it consists of columns and rows.
Remember, a table is the most common and simplest form of data storage in a relational
database. Following is the example of a CUSTOMERS table:
ID
1
2
3
4
5
6
7

NAME
Ramesh
Khilan
Kaushik
Chaitali
Hardik
Komal
Muffy

AGE
32
25
23
25
27
22
24

ADDRESS
Ahmedabad
Delhi
Kota
Mumbai
Bhopal
MP
Indore

SALARY
2000
1500
2000
6500
8500
4500
10000

Field
Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about
every record in the table.
Record or Row
A record, also called a row of data, is each individual entry that exists in a table. For example
there are 7 records in the above CUSTOMERS table. Following is a single row of data or
record in the CUSTOMERS table:
1

Ramesh

32

Ahmedabad

2000

A record is a horizontal entity in a table.


Column
A column is a vertical entity in a table that contains all information associated with a specific
field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents location
description and would consist of the following:
ADDRESS
Ahmedabad
Delhi
Kota
Mumbai
Mr. Y SUBBA RAYUDU M. Tech

Page 39

DBMS
Bhopal
MP
Indore
NULL value
A NULL value in a table is a value in a field that appears to be blank, which means a field with
a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a field
that contains spaces. A field with a NULL value is one that has been left blank during record
creation.
SQL Constraints
Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the
database.
Constraints could be column level or table level. Column level constraints are applied only to
one column whereas table level constraints are applied to the whole table.
Following are commonly used constraints available in SQL:
NOT NULL: Ensures that a column cannot have NULL value.
DEFAULT: Provides a default value for a column when none is specified.
UNIQUE: Ensures that all values in a column are different.
PRIMARY Key: Uniquely identified each rows/records in a database table.
FOREIGN Key: Uniquely identified a rows/records in any another database table.
CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy
certain conditions.
INDEX: Use to create and retrieve data from the database very quickly.
Data Integrity
The following categories of the data integrity exist with each RDBMS:
Entity: There are no duplicate rows in a table.
Domain: Enforces valid entries for a given column by restricting the type, the format,
or the range of values.
Referential: Rows cannot be deleted, which are used by other records.
User-Defined: Enforces some specific business rules that do not fall into entity, domain
or referential integrity.
SQL History
In 1971, IBM researchers created a simple non-procedural language called Structured English
Query Language. or SEQUEL. This was based on Dr. Edgar F. (Ted) Codd's design of a
relational model for data storage where he described a universal programming language for
accessing databases.
Mr. Y SUBBA RAYUDU M. Tech

Page 40

DBMS
In the late 80's ANSI and ISO (these are two organizations dealing with standards for a wide
variety of things) came out with a standardized version called Structured Query Language or
SQL. SQL is prounced as 'Sequel'. There have been several versions of SQL and the latest one
is SQL-99. Though SQL-92 is the current universally adopted standard.
SQL is the language used to query all databases. It's simple to learn and appears to do very
little but is the heart of a successful database application. Understanding SQL and using it
efficiently is highly imperative in designing an efficient database application. The better your
understanding of SQL the more versatile you'll be in getting information out of databases.A
SQL SELECT statement can be broken down into numerous elements, each beginning with a
keyword. Although it is not necessary, common convention is to write these keywords in all
capital letters. In this article, we will focus on the most fundamental and common elements of a
SELECT statement, namely
SELECT
FROM
WHERE
ORDER BY
The SELECT ... FROM Clause
The most basic SELECT statement has only 2 parts:
What columns you want to return
What table(s) those columns come from.
Examples of Basic SQL Queries:
If we want to retrieve all of the information about all of the customers in the Employees table,
we could use the asterisk (*) as a shortcut for all of the columns, and our query looks like
SELECT * FROM Employees
If we want only specific columns (as is usually the case), we can/should explicitly specify them
in a comma-separated list, as in
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
Explicitly specifying the desired fields also allows us to control the order in which the fields
are returned, so that if we wanted the last name to appear before the first name, we could write
SELECT EmployeeID, LastName, FirstName, HireDate, City FROM Employees

The WHERE Clause


The next thing we want to do is to start limiting, or filtering, the data we fetch from the
database. By adding a WHERE clause to the SELECT statement, we add one (or more)
Mr. Y SUBBA RAYUDU M. Tech

Page 41

DBMS
conditions that must be met by the selected data. This will limit the number of rows that answer
the query and are fetched. In many cases, this is where most of the "action" of a query takes
place.
Examples
We can continue with our previous query, and limit it to only those employees living in
London:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City = 'London'
If you wanted to get the opposite, the employees who do not live in London, you would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City <> 'London'
It is not necessary to test for equality; you can also use the standard equality/inequality
operators that you would expect. For example, to get a list of employees who were hired on or
after a given date, you would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'
Of course, we can write more complex conditions. The obvious way to do this is by having
multiple conditions in the WHERE clause. If we want to know which employees were hired
between two given dates, we could write
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM
Employees
WHERE
(HireDate >= '1-june-1992') AND (HireDate <= '15-december-1993')
Note that SQL also has a special BETWEEN operator that checks to see if a value is between
two values (including equality on both ends). This allows us to rewrite the previous query as
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM
Employees
WHERE HireDate BETWEEN '1-june-1992' AND '15-december-1993'
We could also use the NOT operator, to fetch those rows that are not between the specified
dates:
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM
Employees
WHERE HireDate NOT BETWEEN '1-june-1992' AND '15-december-1993'
Let us finish this section on the WHERE clause by looking at two additional, slightly more
sophisticated, comparison operators.
What if we want to check if a column value is equal to more than one value? If it is only 2
values, then it is easy enough to test for each of those values, combining them with the OR
operator and writing something like
Mr. Y SUBBA RAYUDU M. Tech

Page 42

DBMS
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City = 'London' OR City = 'Seattle'
However, if there are three, four, or more values that we want to compare against, the above
approach quickly becomes messy. In such cases, we can use the IN operator to test against a set
of values. If we wanted to see if the City was either Seattle, Tacoma, or Redmond, we would
write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City IN ('Seattle', 'Tacoma', 'Redmond')
As with the BETWEEN operator, here too we can reverse the results obtained and query for
those rows where City is not in the specified list:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City NOT IN ('Seattle', 'Tacoma', 'Redmond')
Finally, the LIKE operator allows us to perform basic pattern-matching using wildcard
characters. For Microsoft SQL Server, the wildcard characters are defined as follows:
Wildcard
_ (underscore)

Description
matches any single character

matches a string of one or more characters

[]

matches any single character within the specified range (e.g. [a-f])
or set (e.g. [abcdef]).

[^]

matches any single character not within the specified range (e.g.
[^a-f]) or set (e.g. [^abcdef]).

Here too, we can opt to use the NOT operator: to find all of the employees whose first name
does not start with 'M' or 'A', we would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE (FirstName NOT LIKE 'M%') AND (FirstName NOT LIKE 'A%')

The ORDER BY Clause


Until now, we have been discussing filtering the data: that is, defining the conditions that
determine which rows will be included in the final set of rows to be fetched and returned from
the database. Once we have determined which columns and rows will be included in the results
of our SELECT query, we may want to control the order in which the rows appearsorting
the data.
To sort the data rows, we include the ORDER BY clause. The ORDER BY clause includes
one or more column names that specify the sort order. If we return to one of our first SELECT
statements, we can sort its results by City with the following statement:
Mr. Y SUBBA RAYUDU M. Tech

Page 43

DBMS
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
ORDER BY City
If we want the sort order for a column to be descending, we can include the DESC keyword
after the column name.
The ORDER BY clause is not limited to a single column. You can include a comma-delimited
list of columns to sort bythe rows will all be sorted by the first column specified and then by
the next column specified. If we add the Country field to the SELECT clause and want to sort
by Country and City, we would write:
SELECT EmployeeID, FirstName, LastName, HireDate, Country, City
FROM Employees
ORDER BY Country, City DESC
Note that to make it interesting, we have specified the sort order for the City column to be
descending (from highest to lowest value). The sort order for the Country column is still
ascending. We could be more explicit about this by writing
SELECT EmployeeID, FirstName, LastName, HireDate, Country, City
FROM Employees
ORDER BY Country ASC, City DESC
It is important to note that a column does not need to be included in the list of selected
(returned) columns in order to be used in the ORDER BY clause. If we don't need to see/use
the Country values, but are only interested in them as the primary sorting field we could write
the query as
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
ORDER BY Country ASC, City DESC

INTRODUCTION

TO

NESTED QUERIES

Nested Queries
A Subquery or Inner query or Nested query is a query within another SQL query and embedded
within the WHERE clause.A subquery is used to return data that will be used in the main query
as a condition to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple columns
are in the main query for the subquery to compare its selected columns.
Mr. Y SUBBA RAYUDU M. Tech

Page 44

DBMS

An ORDER BY cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY can be used to perform the same function as the ORDER
BY in a subquery.Subqueries that return more than one row can only be used with
multiple value operators, such as the IN operator.
The SELECT list cannot include any references to values that evaluate to a BLOB,
ARRAY, CLOB, or NCLOB.

EXISTS (sub query)


The argument of EXISTS is an arbitrary SELECT statement. The sub query is evaluated to
determine whether it returns any rows. If it returns at least one row, the result of EXISTS is
TRUE; if the sub query returns no rows, the result of EXISTS is FALSE.
The sub query can refer to variables from the surrounding query, which will act as constants
during any one evaluation of the sub query.
This simple example is like an inner join on col2, but it produces at most one output row for
each tab1 row, even if there are multiple matching tab2 rows:
SELECT col1
FROM tab1
WHERE EXISTS (SELECT 1
FROM tab2
WHERE col2 = tab1.col2);
Example "Students in Projects":
SELECT name
FROM stud
WHERE EXISTS (SELECT 1
FROM assign
WHERE stud = stud.id);

[NOT] IN/IN [NOT]


The right-hand side of this form of IN is a parenthesized list of scalar expressions. The result is
TRUE if the left-hand expression's result is equal to any of the right-hand expressions.
The right-hand side of this form of IN is a parenthesized sub query, which must return exactly
one column. The left-hand expression is evaluated and compared to each row of the sub query
result. The result of IN is TRUE if any equal sub query row is found.
SELECT id, name
FROM stud
Mr. Y SUBBA RAYUDU M. Tech

Page 45

DBMS
WHERE id IN ( SELECT stud
FROM assign
WHERE id = 1);
ANY and SOME
The right-hand side of this form of ANY is a parenthesized sub query, which must return
exactly one column. The left-hand expression is evaluated and compared to each row of the sub
query result using the given operator, which must yield a Boolean result. The result of ANY is
TRUE
if
any
true
result
is
obtained.
SOME is a synonym for ANY. IN is equivalent to = ANY.
ALL
The right-hand side of this form of ALL is a parenthesized sub query, which must return
exactly one column. The left-hand expression is evaluated and compared to each row of the sub
query result using the given operator, which must yield a Boolean result. The result of ALL is
TRUE if all rows yield TRUE (including the special case where the sub query returns no
rows). NOT IN is equivalent to <> ALL.
Row-wise comparison
The left-hand side is a list of scalar expressions. The right-hand side can be either a list of
scalar expressions of the same length, or a parenthesized sub query, which must return exactly
as many columns as there are expressions on the left-hand side. Furthermore, the sub query
cannot return more than one row. (If it returns zero rows, the result is taken to be NULL.) The
left-hand side is evaluated and compared row-wise to the single sub query result row, or to the
right-hand expression list. Presently, only = and <> operators are allowed in row-wise
comparisons. The result is TRUE if the two rows are equal or unequal, respectively.
CORRELATED NESTED QUERIES
SQL Correlated Subqueries are used to select data from a table referenced in the outer query.
The subquery is known as a correlated because the subquery is related to the outer query. In
this type of queries, a table alias (also called a correlation name) must be used to specify which
table reference is to be used.
The alias is the pet name of a table which is brought about by putting directly after the table
name in the FROM clause. This is suitable when anybody wants to obtain information from
two separate tables.
SELECT a.ord_num,a.ord_amount,a.cust_code,a.agent_code
FROM orders a
WHERE a.agent_code=( SELECT b.agent_code
FROM agents b
WHERE b.agent_name='Alex');
OUTPUT:
Mr. Y SUBBA RAYUDU M. Tech

Page 46

DBMS

Using EXISTS
SELECT employee_id, manager_id, first_name, last_name
FROM employees a
WHERE EXISTS (SELECT employee_id
FROM employees b
WHERE b.manager_id = a.employee_id)

SET-COMPARISION OPERATOR
SQL Operators
There are two type of Operators, namely Comparison Operators and Logical Operators. These
operators are used mainly in the WHERE clause, HAVING clause to filter the data to be
selected.
Mr. Y SUBBA RAYUDU M. Tech

Page 47

DBMS
Comparison Operators:Comparison operators are used to compare the column data with
specific values in a condition.Comparison Operators are also used along with the SELECT
statement to filter data based on specific conditions.
Comparison Operators
=
<>, !=
<
>
>=
<=

Description
equal to
is not equal to
less than
greater than
greater than or equal to
less than or equal to

Logical Operators:There are three Logical Operators namely AND, OR and NOT.
SQL Comparison Keywords
There are other comparison keywords available in sql which are used to enhance the search
capabilities of a sql query. They are "IN", "BETWEEN...AND", "IS NULL", "LIKE".
Comparision Operators Description
LIKE
column value is similar to specified character(s).
IN
column value is equal to any one of a specified set of values.
BETWEEN...AND column value is between two values, including the end values
specified in the range.
IS NULL
column value does not exist.
SQL LIKE Operator
The LIKE operator is used to list all rows in a table whose column values match a specified
pattern. It is useful when you want to search rows to match a specific pattern, or when you do
not know the entire value. For this purpose we use a wildcard character '%'.
To select all the students whose name begins with 'S'
SELECT first_name, last_name
FROM student_details
WHERE first_name LIKE 'S%';
The above select statement searches for all the rows where the first letter of the column
first_name is 'S' and rest of the letters in the name can be any character.
There is another wildcard character you can use with LIKE operator. It is the underscore
character, ' _ ' . In a search string, the underscore signifies a single character.
To display all the names with 'a' second character,
SELECT first_name, last_name
FROM student_details
WHERE first_name LIKE '_a%';
Mr. Y SUBBA RAYUDU M. Tech

Page 48

DBMS
NOTE:Each underscore act as a placeholder for only one character. So you can use more than
one underscore. Eg: ' __i% '-this has two underscores towards the left, 'S__j%' - this has two
underscores between character 'S' and 'i'.
SQL BETWEEN ... AND Operator
The operator BETWEEN and AND, are used to compare data for a range of values.
To find the names of the students between age 10 to 15 years, the query would be like,
SELECT first_name, last_name, age
FROM student_details
WHERE age BETWEEN 10 AND 15;
SQL IN Operator
The IN operator is used when you want to compare a column with more than one value. It is
similar to an OR condition.
If you want to find the names of students who are studying either Maths or Science, the query
would be like,
SELECT first_name, last_name, subject
FROM student_details
WHERE subject IN ('Maths', 'Science');
You can include more subjects in the list like ('maths','science','history')
NOTE:The data used to compare is case sensitive.
SQL IS NULL Operator
A column value is NULL if it does not exist. The IS NULL operator is used to display all the
rows for columns that do not have a value.
If you want to find the names of students who do not participate in any games, the query would
be as given below
SELECT first_name, last_name
FROM student_details
WHERE games IS NULL
There would be no output as we have every student participate in a game in the table
student_details, else the names of the students who do not participate in any games would be
displayed.
AGGREGATE OPERATORS
The SQL Aggregate Functions are functions that provide mathematical operations. If you need
to add, count or perform basic statistics, these functions will be of great help.
Mr. Y SUBBA RAYUDU M. Tech

Page 49

DBMS
The functions include:
count()
- counts a number of rows
sum()
- compute sum
avg()
- compute average
min()
- compute minimum
max()
- compute maximum
Use of SQL Aggregate Functions
SQL Aggregate Functions are used as follows. If a grouping of values is needed also include
the GROUP BY clause.Use a column name or expression as the parameter to the Aggregate
Function. The parameter, '*', represents all rows.
SELECT <column_name1>, <column_name2><aggregate_function(s)>
FROM <table_name>
GROUP BY <column_name1>, <column_name2>
Example
The following example Aggregate Functions are applied to the employee_count of the branch
table. The region_nbr is the level of grouping.Here are the contents of the table:
Table: BRANCH
branch_nbr
108
110
212
404
415

branch_name
New York
Boston
Chicago
San Diego
San Jose

region_nbr
100
100
200
400
400

This SQL Statement with aggregate functions is executed:


SELECT region_nbr A, count(branch_nbr)
min(employee_count) D
max(employee_count) E, avg(employee_count) F
FROM dbo.branch
GROUP BY region_nbr
ORDER BY region_nbr

employee_count
10
6
5
6
3

B,

sum(employee_count)

Here is the result.


A
100
200
400

B
2
1
2

Mr. Y SUBBA RAYUDU M. Tech

C
16
5
9

D
6
5
3

Page 50

E
10
5
6

F
8
5
4

C,

DBMS
NULL VALUES
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.A field with a NULL value is a field with no value. It is
very important to understand that a NULL value is different than a zero value or a field that
contains spaces.
Syntax:
The basic syntax of NULL while creating a table:
CREATE TABLE CUSTOMERS
( ID
INT
NOT NULL,
NAME
VARCHAR (20) NOT NULL,
AGE
INT
NOT NULL,
ADDRESS
CHAR (25),
SALARY
DECIMAL (18, 2),
PRIMARY KEY (ID));
Here, NOT NULL signifies that column should always accept an explicit value of the given
data type. There are two columns where we did not use NOT NULL, which means these
columns could be NULL.
A field with a NULL value is one that has been left blank during record creation.
Example:
The NULL value can cause problems when selecting data, however, because when comparing
an unknown value to any other value, the result is always unknown and not included in the
final results.
You must use the IS NULL or IS NOT NULL operators in order to check for a NULL value.
Consider the following table, CUSTOMERS having the following records:
ID
1
2
3
4
5
6
7

NAME
Ramesh
Khilan
kaushik
Chaitali
Hardik
Komal
Muffy

AGE
32
25
23
25
27
22
24

ADDRESS
Ahmedabad
Delhi
Kota
Mumbai
Bhopal
MP
Indore

Now, following is the usage of IS NOT NULL operator:


SELECT ID, NAME, AGE, ADDRESS, SALARY
Mr. Y SUBBA RAYUDU M. Tech

Page 51

SALARY
2000.00
1500.00
2000.00
6500.00
8500.00

DBMS
FROM CUSTOMERS
WHERE SALARY IS NOT NULL;
This would produce the following result:
ID
1
2
3
4
5

NAME
Ramesh
Khilan
kaushik
Chaitali
Hardik

AGE
32
25
23
25
27

ADDRESS
Ahmedabad
Delhi
Kota
Mumbai
Bhopal

SALARY
2000.00
1500.00
2000.00
6500.00
8500.00

Now, following is the usage of IS NULL operator:


SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
WHERE SALARY IS NULL;
This would produce the following result:
ID
6
7

NAME
Komal
Muffy

AGE
22
24

ADDRESS
MP
Indore

SALARY

SQL LOGICAL OPERATORS

There are three Logical Operators namely, AND, OR, and NOT. These operators compare two
conditions at a time to determine whether a row can be selected for the output. When retrieving
data using a SELECT statement, you can use logical operators in the WHERE clause, which
allows you to combine more than one condition.
Logical Operators
OR

AND

NOT

Description
For the row to be selected at
least one of the conditions
must be true.
For a row to be selected all the
specified conditions must be
true.
For a row to be selected the
specified condition must be
false.

"OR" Logical Operator

Mr. Y SUBBA RAYUDU M. Tech

Page 52

DBMS
If you want to select rows that satisfy at least one of the given conditions, you can use the
logical operator, OR.
Example: if you want to find the names of students who are studying either Maths or Science,
the query would be like,
SELECT first_name, last_name, subject
FROM student_details
WHERE subject = 'Maths' OR subject = 'Science'
firs
t_n
am
e

las
t_
na
me

----------

----------

An
ajal
i

Bh
ag
wa
t

She
kar

Go
wd
a

Ra
hul

Sh
ar
ma

Ste
phe
n

Fle
mi
ng

s
u
b
je
ct
-----M
at
h
s
M
at
h
s
S
ci
e
n
c
e
S
ci
e
n
c
e

The following table describes how logical "OR" operator selects a row.
Column1
Satisfied?
YES
Mr. Y SUBBA RAYUDU M. Tech

Column2
Satisfied?
YES
Page 53

Row
Selected
YES

DBMS
YES
NO
NO

NO
YES
NO

YES
YES
NO

"AND" Logical Operator


If you want to select rows that must satisfy all the given conditions, you can use the logical
operator, AND.
Example: To find the names of the students between the age 10 to 15 years, the query would be
like:
SELECT first_name, last_name, age
FROM student_details
WHERE age >= 10 AND age <= 15;
The output would be something like,
firs
last
a
t_n
_na
g
am
me
e
e
-------------------Rah
Sha
1
ul
rma
0
Bh
Ana
1
ag
jali
2
wat
Go
She
1
wd
kar
5
a
The following table describes how logical "AND" operator selects a row.
Column1
Satisfied?
YES
YES
NO
NO

Column2
Satisfied?
YES
NO
YES
NO

Row
Selected
YES
NO
NO
NO

"NOT" Logical Operator


Mr. Y SUBBA RAYUDU M. Tech

Page 54

DBMS
If you want to find rows that do not satisfy a condition, you can use the logical operator, NOT.
NOT results in the reverse of a condition. That is, if a condition is satisfied, then the row is not
returned.
Example: If you want to find out the names of the students who do not play football, the query
would be like:
SELECT first_name, last_name, games
FROM student_details
WHERE NOT games = 'Football'

OUTER JOINS

All joins mentioned above, that is Theta Join, Equi Join and Natural Join are called inner-joins.
An inner-join process includes only tuples with matching attributes, rest are discarded in
resulting relation. There exists methods by which all tuples of any relation are included in the
resulting relation.
There are three kinds of outer joins:
Left outer join ( R S )
All tuples of Left relation, R, are included in the resulting relation and if there exists tuples in
R without any matching tuple in S then the S-attributes of resulting relation are made NULL.
Left
A
100
101
102

B
Database
Mechanics
Electronics

A
100
102
104

B
Alex
Maya
Mira

Right

Left outer join output


A
B
100
Database
101
Mechanics
102
Electronics

C
100
--102

D
Alex
--Maya

Right outer join: ( R S )


Mr. Y SUBBA RAYUDU M. Tech

Page 55

DBMS
All tuples of the Right relation, S, are included in the resulting relation and if there exists tuples
in S without any matching tuple in R then the R-attributes of resulting relation are made
NULL.
Right outer join output
A
100
102
---

B
Database
Electronics
---

C
100
102
104

D
Alex
Maya
Mira

Full outer join: ( R S)


All tuples of both participating relations are included in the resulting relation and if there no
matching tuples for both relations, their respective unmatched attributes are made NULL.
Full outer join output
A
100
101
102
---

B
Database
Mechanics
Electronics
---

C
100
--102
104

D
Alex
--Maya
Mira

DISALLOWING NULL VALUES

SQL NOT NULL Statement


Now one wants to display the field entries whose location is not left blank, then here is a
statement example.
SELECT * FROM Employee
WHERE Location IS NOT NULL;
SQL NOT NULL Statement Output:
The NOT NULL statement will display the following results
Employee
ID
1001
1002
1003
1006

Employee
Name
Henry
Tina
John
Sophie

Age

Gender

Location

Salary

54
36
24
29

Male
Female
Male
Female

New York
Moscow
London
London

100000
80000
40000
60000

Complex Integrity - Constraints in SQL Triggers


Mr. Y SUBBA RAYUDU M. Tech

Page 56

DBMS
An integrity constraint defines a business rule for a table column. When enabled, the rule will
be enforced by oracle (and so will always be true.) To create an integrity constraint all existing
table data must satisfy the constraint.
Default values are also subject to integrity constraint checking (defaults are included as part of
an INSERT statement before the statement is parsed.)
If the results of an INSERT or UPDATE statement violate an integrity constraint, the statement
will be rolled back.
Integrity constraints are stored as part of the table definition, (in the data dictionary.)
If multiple applications access the same table they will all adhere to the same rule.
The following integrity constraints are supported by Oracle:

NOT NULL
UNIQUE
CHECK constraints for complex integrity rules
PRIMARY KEY
FOREIGN KEY integrity constraints - referential integrity actions: On Update On
Delete Delete CASCADE Delete SET NULL

Constraint States
The current status of an integrity constraint can be changed to any of the following 4 options
using the CREATE TABLE or ALTER TABLE statement.

ENABLE - Ensure that all incoming data conforms to the constraint


DISABLE - Allow incoming data, regardless of whether it conforms to the constraint
VALIDATE - Ensure that existing data conforms to the constraint
NOVALIDATE - Allow existing data to not conform to the constraint

These can be used in combination


ENABLE { VALIDATE | NOVALIDATE }
DISABLE { VALIDATE | NOVALIDATE }
ENABLE VALIDATE is the same as ENABLE.
ENABLE NOVALIDATE means that the constraint is checked, but it does not have to be true
for all rows. This will resume constraint checking for Inserts and Updates but will not validate
any data that already exists in the table.
DISABLE NOVALIDATE is the same as DISABLE.

Mr. Y SUBBA RAYUDU M. Tech

Page 57

DBMS
DISABLE VALIDATE disables the constraint, drops the index on the constraint, and
disallows any modification of the constrained columns.
For a UNIQUE constraint, this enables you to load data from a nonpartitioned table into a
partitioned table using the ALTER

TRIGGERS

A database trigger is a procedure written in PL/SQL, Java, or C that will run implicitly when
data is modified or when some user or system actions occur.
Triggers can be used in many ways e.g. to enforce complex integrity constraints or to audit data
modifications. Triggers should not be used to enforce business rules or referential integrity
rules that could be implemented with simple constraints.
Triggers are implicitly fired by Oracle when a triggering event occurs, no matter which user is
connected or which application is being used.
A row trigger is fired once for each row affected by an UPDATE statement.
A statement trigger is fired once, regardless of the number of rows in the table.
BEFORE triggers execute the trigger action before the triggering statement is executed. This
type of trigger is commonly used if the trigger will derive specific column values or if the
trigger action will determine whether the triggering statement should be allowed to complete.
Appropriate use of a BEFORE trigger can eliminate unnecessary processing of the triggering
statement.
AFTER triggers execute the trigger action after the triggering statement is executed.
For any given table you can have multiple triggers of the same type for the same statement.
E.g. multiple AFTER UPDATE triggers on the same table

Mr. Y SUBBA RAYUDU M. Tech

Page 58

DBMS

UNIT - III
INTRODUCTION TO SCHEMA REFINEMENT

Conceptual database design gives us a set of relation schemas and integrity constraints (ICs)
that can be regarded as a good starting point for the final database design.This initial design
must be refined by taking the lCs into account more fully than is possible with just the ER
model constructs and also by considering performance criteria and typical workloads.
Introduction to Schema Refinement:
We now present an overview of the problems that schema refinement is intended to address
and a refinement approach based on decompositions.
Redundant storage of information is the root cause of these problems.
Although decomposition can eliminate redundancy, it can lead to problems of its own and
should be used with caution.
Problems caused by Redundancy:
Redundant Storage
Update Anomalies
Insertion Anomalies
Deletion Anomalies
Hourly_Emps (SSN, Name, Lot,Rating, Hourly_wages, Hours_worked)
SSN
123
456
326
434
612

Name
Rajesh
Ajay
Arun
Kamal
Nitin

Lot
48
22
35
35
35

Rating
8
8
5
5
8

Hourly_wages
10
10
7
7
10

Hours_worked
40
30
30
32
40

Decompositions
The Problems arising from redundancy can be solved by replacing a relation with collection of
smaller relations.
A Decomposition of a relation schema R consists of replacing the relation schema by two (or
more) relation schemas that each contain a subset of attributes of R and together include all
attributes of R.
Hourly_Emps2 (SSN, Name, Lot, Rating, Hours_worked)
Mr. Y SUBBA RAYUDU M. Tech

Page 59

DBMS
Wages( Rating, Hourly_wages)
Problems related to Decomposition
Unless we are careful decomposing a relation schema can create some problems than it solves.
We need to ask two questions repeatedly
Is there reason to decompose a relation?
To answer this question, several normal forms have been proposed for relations. If a
relation schema is in one of these normal forms, we know that certain kinds of
problems cannot arise.

What problems (if any) does the decomposition cause?


With respect to the second question, two properties of decompositions are of particular
interest. The lossless-join property enables us to recover any instance of the
decomposed relation from corresponding instances of the smaller relations.

The dependency-preservation property enables us to enforce any constraint on the original


relation by simply enforcing some constraints on each of the smaller relations. That is, we need
not perform joins of the smaller relations to check whether a constraint on the original relation
is violated.
Functional Dependencies
A Functional Dependencies (FD) is a kind of IC that generalizes the concept of a key.
Let R be a relation schema & let X & Y be nonempty sets of attributes in R. then an instance r
of R satisfies the FD X Y if following holds for every pair of tuples t1 & t2 in r
If t1.X = t2.X then t1.Y = t2.Y
A
a1
a1
a1
a2

B
b1
b1
b2
b1

C
c1
c1
c2
c3

D
d1
d2
d1
d1

AB C
<a1, b1, c2, d1>
Closure of a Set of FDs
We say that an FDs is implied by a given set F of FDs if f holds on every relation instance that
satisfies all dependencies in F; that is, f holds whenever all FDs in F hold.
The set of all FDs implied by a given set F of FDs is called the closure of F, denoted by F + .
The three rules called Armstrongs Axioms, can be applied repeatedly to infer all FDs implied
by a set F of FDs.
Armstrongs Axioms
Mr. Y SUBBA RAYUDU M. Tech

Page 60

DBMS
Here X, Y & Z denote sets of attributes of relation R:
Reflexivity : If X Y, then X Y.
Augmentation : If X Y, then XZ YZ for any Z.
Transitivity : If X Y and Y Z, then X Z
Union : If X Y & X Z, then XYZ
Decomposition : If XYZ, then X Y & X Z
Contracts ( contractid, supplierid, projectid, deptid, partid, qty, value)
This can be denoted as CSJDPQV.
The meaning of tuple is that the contract with contractid C is an agreement that supplier S will
supply Q items of part P to project J associated with department D, the value V of this contract
is equal to value.
The ICs are known to hold are
1. The contract id C is a key : C CSJDPQV
2. A project purchases a given part using a single contract: JP C
3. A department purchases at most one part from supplier: SD P
Some additional FDs hold in the closure of the set of given FDs
From JP C, C CSJDPQV & transitivity JP CSJDPQV
From SD P & augmentation SDJ JP
From SDJ JP & JP CSJDPQV & transitivity SDJ CSJDPQV
From C CSJDPQV using decomposition C C, C S, C J etc.
And we may have number of FDs from reflexivity.
Attribute Closure
If we just want to check whether a given dependency, say, X Y, is in the closure of a set F of
FDs, we can do so efficiently without computing F+ .
We first cornpute the Attribute closure X + with respect to F, is the set of attributes A such that
X A can be inferred using the Armstrong Axioms. We can find attribute closure using this
algorithm.
Closure = X
Repeat until there is no change:
{
If there is an FD V W in F such that
V C closure,
then set closure = closure U W
}
Superkey A superkey of a relation schema R={A1, A2, An} is a set of attributes S R with
property that no two tuples t1 & t2 in any legal relation state r of R will have t1[S]=t2[S].

Mr. Y SUBBA RAYUDU M. Tech

Page 61

DBMS
Prime Attribute An attribute of relation schema R is called a prime attribute of R if it is a
member of some candidate key of R.
Couple of additional rules (that follow from axioms):
Union If XY and XZ, then XY Z
e.g., if sidacode and sidcity, then sidacode,city
Decomposition - If XY Z, then XY and XZ
e.g., if sidacode,city then sidacode, and sidcity
Examples: Derive union rule from axioms (Reflexivity, Augmentation, and Transitivity) Drive
Decomposition rule from Reflex and Trans. Corollary: Given any set of FDs F, can convert F
into an equivalent set of FDs F, s.t. every FD in F is of the form XA, where X is a set of
attributes and A is a single attribute.
Normalization
If a database design is not perfect it may contain anomalies, which are like a bad dream for
database itself. Managing a database with anomalies is next to impossible.
Update anomalies: if data items are scattered and are not linked to each other properly,
then there may be instances when we try to update one data item that has copies of it
scattered at several places, few instances of it get updated properly while few are left
with there old values. This leaves database in an inconsistent state.
Deletion anomalies: we tried to delete a record, but parts of it left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies: we tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring database to consistent state
and free from any kinds of anomalies.

First Normal Form:

This is defined in the definition of relations (tables) itself. This rule defines that all the
attributes in a relation must have atomic domains. Values in atomic domain are indivisible
units.

Mr. Y SUBBA RAYUDU M. Tech

Page 62

DBMS
We re-arrange the relation (table) as below, to convert it to First Normal Form

Each attribute must contain only single value from its pre-defined domain.

Second Normal Form:

Before we learn about second normal form, we need to understand the following:
Prime attribute: an attribute, which is part of prime-key, is prime attribute.
Non-prime attribute: an attribute, which is not a part of prime-key, is said to be a nonprime attribute.
Second normal form says, that every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X A holds, then there should not be any proper
subset Y of X, for that Y A also holds.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name
can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.

Mr. Y SUBBA RAYUDU M. Tech

Page 63

DBMS

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form:

For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy:
No non-prime attribute is transitively dependent on prime key attribute
For any non-trivial functional dependency, X A, then either
X is a superkey or,
A is prime attribute.

We find that in above depicted Student_detail relation, Stu_ID is key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor City is a prime attribute. Additionally, Stu_ID Zip City, so there
exists transitive dependency.

We broke the relation as above depicted two relations to bring it into 3NF.

Mr. Y SUBBA RAYUDU M. Tech

Page 64

DBMS
Boyce-Codd Normal Form:
BCNF is an extension of Third Normal Form in strict way. BCNF states that
For any non-trivial functional dependency, X A, then X must be a super-key.
In the above depicted picture, Stu_ID is super-key in Student_Detail relation and Zip is superkey in ZipCodes relation. So, Stu_ID Stu_Name, ZipAndZip CityConfirms, that both
relations are in BCNF.
Lossless-Join Decomposition:
Let R be a relation schema and let F be a set of FDs over R. A decomposition of R into two
schemas with attribute sets X and Y is said to be a lossless-join decomposition with respect to
F if, for every instance r of R that satisfies the dependencies in F, x (r)
y (r) = r. In
other words, we can recover the original relation from the decomposed relations.
From the definition it is easy to see that r is always a subset of natural join of decomposed
relations. If we take projections of a relation and recombine them using natural join, we
typically obtain some tuples that were not in the original relation.
Example:
By replacing the instance r shown in figure with the instances SP (r) and PD (r), we lose some
information.
S
P
D
s1
p1
d1
s2
p2
d2
s3
p1
d3
Instance r
S
s1
s2
s3

P
p1
p2
p1

P
p1
p2
p1

SP (r)

S
s1
s2
s3
s1
s3
SP (r)
Theorem:

P
p1
p2
p1
p1
p1
PD (r)

Mr. Y SUBBA RAYUDU M. Tech

D
d1
d2
d3

PD (r)

D
d1
d2
d3
d3
d1
Fig: Instances illustrating Lossy Decompositions

Page 65

DBMS

Let R be a relation and F be a set of FDs that hold over R. The decomposition of R into
relations with attribute sets R1 and R2 is lossless if and only if F+ contains either the FD
R1 R2 R1 (or R1R2) or the FD R1 R2 R2 (or R2R1).
Consider the Hourly_Emps relation. It has attributes SNLRWH, and the FD RW
causes a violation of 3NF. We dealt this violation by decomposing the relation
into SNLRH and RW. Since R is common to both decomposed relations and RW
holds, this decomposition is lossless-join.

Dependency-Preserving Decomposition:
Consider the Contracts relation with attributes CSJDPQV. The given FDs are CCSJDPQV,
JPC, and SDP. Because SD is not a key, the dependency SDP causes a violation of
BCNF.
We can decompose Contracts into relations with schemas CSJDQV and SDP to address this
violation. The decomposition is lossless-join. But, there is one problem. If we want to enforce
an integrity constraint JPC, it requires an expensive join of the two relations. We say that
this decomposition is not dependency-preserving.
Let R be a relation schema that is decomposed into two schemas with attributes sets X and Y,
and let F be a set of FDs over R. The projection of F on X is the set of FDs in the closure
F+ that involve only attributes in X. We denote the projection of F on attributes X as FX . Note
that a dependency UV in F+ is in FX only if all the attributes in U and V are in X.
The decomposition of relation schema R with FDs F into schemas with attribute sets X and Y
is dependency-preserving if (FX U FY)+ = F+.
Example:
Consider the relation R with attributes ABC is decomposed into relations with attributes AB
and BC. The set of FDs over R includes AB, BC, and CA.
The closure of F contains all dependencies in F plus AC, BA, and CB. Consequently
FAB contains AB and BA, and FBC contains BC and CB. Therefore, FAB U FBC contains
AB, BC, BA and CB. The closure of FAB and FBC now includes CA (which follows
from CB and BA). Thus the decomposition preserves the dependency CA.

SCHEMA REFINEMENT IN DATABASE DESIGN

Design process
1. Determine the purpose of the database - This helps prepare for the remaining steps.
2. Find and organize the information required - Gather all of the types of information
to record in the database, such as product name and order number.
Mr. Y SUBBA RAYUDU M. Tech

Page 66

DBMS
3. Divide the information into tables - Divide information items into major entities or
subjects, such as Products or Orders. Each subject then becomes a table.
4. Turn information items into columns - Decide what information needs to be stored in
each table. Each item becomes a field, and is displayed as a column in the table. For
example, an Employees table might include fields such as Last Name and Hire Date.
5. Specify primary keys - Choose each tables primary key. The primary key is a column,
or a set of columns, that is used to uniquely identify each row. An example might be
Product ID or Order ID.
6. Set up the table relationships - Look at each table and decide how the data in one
table is related to the data in other tables. Add fields to tables or create new tables to
clarify the relationships, as necessary.
7. Refine the design - Analyze the design for errors. Create tables and add a few records
of sample data. Check if results come from the tables as expected. Make adjustments to
the design, as needed.
8. Apply the normalization rules - Apply the data normalization rules to see if tables are
structured correctly. Make adjustments to the tables

Multivalued dependency
In database theory, a multivalued dependency is a full constraint between two sets of attributes
in a relation.
In contrast to the functional dependency, the multivalued dependency requires that certain
tuples be present in a relation. Therefore, a multivalued dependency is a special case of tuplegenerating dependency. The multivalued dependency plays a role in the 4NF database
normalization.
A multivalued dependency is a special case of a join dependency, with only two sets of values
involved, i.e. it is a 2-ary join dependency.
Formal definition
The formal definition is given as follows.
Let be a relational schema and let
dependency
(which can be read as

multidetermines

pairs of tuples
that

in

and

and

) holds on

such that

Mr. Y SUBBA RAYUDU M. Tech

(subsets). The multivalued

if, in any legal relation


, there exist tuples

Page 67

and

, for all
in

such

DBMS
In more simple words the above condition can be expressed as follows: if we denote
by

the tuple having values for

collectively equal to

correspondingly, then whenever the tuples


tuples

and

and

exist in , the

should also exist in .

Example
Consider this example of a relation of university courses, the books recommended for the
course, and the lecturers who will be teaching the course:
University courses
Course

Book

Lecturer

AHA

Silberschatz

John D

AHA

Nederpelt

William M

AHA

Silberschatz

William M

AHA

Nederpelt

John D

AHA

Silberschatz

Christian G

AHA

Nederpelt

Christian G

OSO

Silberschatz

John D

OSO

Silberschatz

William M

Because the lecturers attached to the course and the books attached to the course are
independent of each other, this database design has a multivalued dependency; if we were to
add a new book to the AHA course, we would have to add one record for each of the lecturers
on that course, and vice versa.
Put formally, there are two multivalued dependencies in this relation: {course}
{book} and
equivalently {course}
{lecturer}.
Databases with multivalued dependencies thus exhibit redundancy. In database
normalization, fourth normal form requires that either every multivalued dependency X
Y is
trivial or for every nontrivial multivalued dependency X
Y, X is a superkey.
Properties

If

, Then

If

and

, Then

If

and

, then

The following also involve functional dependencies:

If

, then

If

and

, then

Mr. Y SUBBA RAYUDU M. Tech

Page 68

DBMS
The above rules are sound and complete.
A decomposition of R into (X, Y) and (X, R Y) is a lossless-join decomposition if and
only if X
Y holds in R.
Every FD is an MVD because if X
Y, then swapping Y's between tuples that agree
on X doesn't create new tuples.
Splitting Doesnt Hold. Like FDs, we cannot generally split the left side of an
MVD.But unlike FDs, we cannot split the right side either, sometimes you have to
leave several attributes on the right side.
Closure of a set of MVDs is the set of all MVDs that can be inferred using the
following rules (Armstrong's axioms):
Complementation: If X
Y, then X
R - XY
Augmentation: If X
Y and Z W, then XW
YZ
Transitivity: If X
Y and Y Z, then X
Z-Y
Replication: If X
Y, then X
Y

Coalescence: If X
Z

Y and

W s.t. W

Y= ,W

Z, and Z

Y, then X

Full Constraint
A constraint which expresses something about all attributes in a database. (In contrast to
an embedded constraint.) That a multivalued dependency is a full constraintfollows from its
definition,as where it says something about the attributes

tuple-generating dependency
A dependency which explicitly requires certain tuples to be present in the relation.
trivial multivalued dependency 1
A multivalued dependency which involves all the attributes of a relation i.e.
.A
trivial multivalued dependency implies, for tuples and , tuples and which are equal
to and .
trivial multivalued dependency 2
A multivalued dependency for which

MVD Example
Course ->> Instructor
Course ->> Text
Course(Y)
Intro
Intro

Instructor(X)
Kruse
Wright

Mr. Y SUBBA RAYUDU M. Tech

Text(R-XY)
Intro to CS
Intro to CS
Page 69

DBMS
CS1
CS1
CS2
CS2
CS2
CS2

Thomas
Thomas
Rhodes
Rhodes
Kruse
Kruse

Intro to Java
CS Theory Survey
Java Data Structures
Unix
Java Data Structures
Unix

Fourth Normal Form:


A relation R is in 4NF if for all MVD in D+ of the form A->>B at least one of the following
hold
A ->> B is a trivial MVD
A is a superkey
Above example is not 4NF
(Course, Instructor) and (Course, Text) is 4NF decomposition
LJ Decomposition for 4NF Relations
The relation schemas S and T form a LJ decomposition for R iff
(S*T) ->>(S-T) or (S*T)->>(T-S).
replace MVD for FD in BCNF Algorithm to produce a 4NF decomposition algorithm
Join Dependency:
A Join Dependency (JD) over a relation schema R is a statement of the form |><|[schema(R1),
schema(R2), , schema(Rn)], where R = {R1, R2, , Rn} is a database schema such that
schema(R)=schema(R).
A JD |><|[R] is satisfied in a relation r over R, denoted r |= |><|[R], if r = schema(R1) (r) |><|
schema(R2) (r) |><| schema(Rn) (r)
A JD in our banking example:
Define a new relation schema
Loan-info-schema = (branch-name, customer-name, loan-number, amount) in this
banking example.
We can define a relation loan-info(Loan-info-schema) as the set of all tuples on Loan-infoschema such that:
The loan represented by loan-number is made by the branch named branch-name.
The loan represented by loan-number is made to the customer named customer-name.
The loan represented by loan-number is in the amount given by amount.
The preceding definition of the loan-info relation is a conjunction of three predicates: one on
loan-number and branch-name, one on loan-number and customer-name, and one on loannumber and amount.
Fifth Normal Form (Projection-Join Normal Form)
Mr. Y SUBBA RAYUDU M. Tech

Page 70

DBMS
A table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it
cannot have a lossless decomposition into any number of smaller tables.

Properties of 5NF:
Anomalies can occur in relations in 4NF if the primary key has three or more fields.
5NF is based on the concept of join dependence - if a relation cannot be decomposed
any further then it is in 5NF.
Pair wise cyclical dependency means that:
o You always need to know two values (pair wise).
o For any one you must know the other two (cyclical).
Example to understand 5NF
Take the following table structure as an example of a buying table.This is used to track buyers,
what they buy, and from whom they buy. Take the following sample data:

buyer

vendor

item

Sally
Mary
Sally
Mary
Sally

Liz Claiborne
Liz Claiborne
Jordach
Jordach
Jordach

Blouses
Blouses
Jeans
Jeans
Sneakers

Problem:- The problem with the above table structure is that if Claiborne starts to sell Jeans
then how many records must you create to record this fact? The problem is there are pair wise
cyclical dependencies in the primary key. That is, in order to determine the item you must
know the buyer and vendor, and to determine the vendor you must know the buyer and the
item, and finally to know the buyer you must know the vendor and the item.
Solution:- The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item,
and Vendor-Item. So following tables are in the 5NF.

Buyer-Vendor
buyer
Mr. Y SUBBA RAYUDU M. Tech

vendor
Page 71

DBMS
Sally

Liz Claiborne

Mary

Liz Claiborne

Sally

Jordach

Mary

Jordach

Buyer-Item

buyer

item

Sally

Blouses

Mary

Blouses

Sally

Jeans

Mary

Jeans

Sally

Sneakers

Vendor-Item

vendor

Mr. Y SUBBA RAYUDU M. Tech

item

Page 72

DBMS
Liz Claiborne

Blouses

Jordach

Jeans

Jordach

Sneakers

Note: There is also one more normal form i.e. 6 NF. A table is in sixth normal form (6NF) or
Domain-Key normal form (DKNF) if it is in 5NF and if all constraints and dependencies that
should hold on the relation can be enforced simply by enforcing the domain constraints and the
key constraints specified on the relation.

Inclusion Dependencies:
Inclusion dependencies support an essential semantics of the standard relational data
model. An inclusion dependency is defined as the existence of attributes (the left term) in a
table R whose values must be a subset of the values of the corresponding attributes (the right
term) in another table S .When the right term conforms a unique column or a primary key (K)
for the table S, the inclusion dependency is key-based (also named referential integrity
restriction, rir). In this case, the left term is a foreign key (FK) in R and the restriction is stated
as R[FK] <<S[K]. On the contrary, if the right term does not constitute the key of S, the
inclusion dependency is non-key-based (simply, an inclusion dependency, id). Ids are
expressed as R[X] S[Z], being R[X] and S[Z] the left and right terms respectively. Both, rirs
and ids, are often called referential constraints.

Use of Inclusion Dependencies as Domain Constraints:


Some relationships have the semantics of ids or rirs symbolizing, essentially, domain
restrictions that indicate the legal values for an attribute. UofD business rules associated to
specific domain restrictions over dynamic and voluminous set of values are frequently written
as ids or rirs. For example,CHECK (LeftAttrList IN (SELECT RightAttrList FROM R)) would
indicate that the set of allowable values for LeftAttrList is conformed by the current set of
instances of RightAttrList in the relation R

UNIT-4
TRANSACTION MANAGEMENT
Transactions:
Transaction Concept
Mr. Y SUBBA RAYUDU M. Tech

Page 73

DBMS
Transaction State
Implementation of Atomicity and Durability
Concurrent Executions
Serializability
Recoverability
Implementation of Isolation
Transaction Definition in SQL
Testing for Serializability.

Transaction Concept
A transactionis a unit of program execution that accesses and possibly updates various
data items.
A transaction must see a consistent database.
During transaction execution the database may be inconsistent.
When the transaction is committed, the database must be consistent.
Two main issues to deal with:
Failures of various kinds, such as hardware failures and system crashes
Concurrent execution of multiple transactions

ACID Properties
Atomicity: Either all operations of the transaction are properly reflected in the database
or none are.
Mr. Y SUBBA RAYUDU M. Tech

Page 74

DBMS
Consistency: Execution of a transaction in isolation preserves the consistency of the
database.
Isolation: Although multiple transactions may execute concurrently, each transaction
must be unaware of other concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed transactions.
That is, for every pair of transactions Tiand Tj, it appears to Tithat either Tj,
finished execution before Ti started, or Tj started execution after Ti finished.
Durability: After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1.

read(A)

2.

A :=A 50

3.

write(A)

4.

read(B)

5.

B :=B + 50

6.

write(B)

Consistency requirement the sum of A and B is unchanged by the execution of the


transaction.
Atomicity requirement if the transaction fails after step 3 and before step 6, the
system should ensure that its updates are not reflected in the database, else an
inconsistency will result.
Durability requirement once the user has been notified that the transaction has
completed (i.e., the transfer of the $50 has taken place), the updates to the database by
the transaction must persist despite failures.
Isolation requirement if between steps 3 and 6, another transaction is allowed to
access the partially updated database, it will see an inconsistent database
(the

sum

will

be

less

than

it

should

be).

Can be ensured trivially by running transactions serially, that is one after the
Mr. Y SUBBA RAYUDU M. Tech

Page 75

DBMS
other.However, executing multiple transactions concurrently has significant benefits, as
we will see.

Transaction States
Active, the initial state; the transaction stays in this state while it is executing
Partially committed, after the final statement has been executed.
Failed, after the discovery that normal execution can no longer proceed.
Aborted, after the transaction has been rolled back and the database restored to its state
prior to the start of the transaction. Two options after it has been aborted:
restart the transaction only if no internal logical error
kill the transaction
Committed, after successful completion.

Implementation of Atomicity and Durability


The recovery-management component of a database system implements the support for
atomicity and durability.
The shadow-database scheme:

Mr. Y SUBBA RAYUDU M. Tech

Page 76

DBMS

assume that only one transaction is active at a time.


a pointer called db_pointer always points to the current consistent copy of the
database.
all updates are made on a shadow copy of the database, and db_pointer is made
to point to the updated shadow copy only after the transaction reaches partial
commit and all updated pages have been flushed to disk.
in case transaction fails, old consistent copy pointed to by db_pointer can be
used, and the shadow copy can be deleted.
Assumes disks to not fail
Useful for text editors, but extremely inefficient for large databases: executing a single
transaction requires copying the entire database.

Concurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are:
increased processor and disk utilization, leading to better transaction
throughput: one transaction can be using the CPU while another is reading from
or writing to the disk
reduced average response time for transactions: short transactions need not
wait behind long ones.
Concurrency control schemes mechanisms to achieve isolation, i.e., to control the
interaction among the concurrent transactions in order to prevent them from destroying
the consistency of the database
Mr. Y SUBBA RAYUDU M. Tech

Page 77

DBMS
Schedules
Schedules sequences that indicate the chronological order in which instructions of
concurrent transactions are executed
a schedule for a set of transactions must consist of all instructions of those
transactions
must preserve the order in which the instructions appear in each individual
transaction.
Example Schedules
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The
following is a serial schedule in which T1 is followed by T2.

Schedule 1

Let T1 and T2 be the transactions defined previously. The following schedule is not a
serial schedule, but it is equivalent to Schedule 1.

Schedule 2

Mr. Y SUBBA RAYUDU M. Tech

Page 78

DBMS
In both above Schedules , the sum A + B is preserved.
The following concurrent schedule does not preserve the value of the the sum A + B.

Serializability
Basic Assumption Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1. conflictserializability
2. viewserializability
We ignore operations other than read and write instructions, and we assume that
transactions may perform arbitrary computations on data in local buffers in between
reads and writes. Our simplified schedules consist of only read and write instructions.

Conflict Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions
wrote Q.

Mr. Y SUBBA RAYUDU M. Tech

Page 79

DBMS
1.

li

read(Q),

2.

li

read(Q),

3.

li

write(Q),

lj

read(Q).
lj

lj

=
=

li

and

write(Q).
read(Q).

ljdont

conflict.

They

conflict.

They

conflict

4. li = write(Q), lj = write(Q). They conflict


Intuitively, a conflict between liand lj forces a (logical) temporal order between them. If
li and lj are consecutive in a schedule and they do not conflict, their results would
remain the same even if they had been interchanged in the schedule.
If a schedule S can be transformed into a schedule S by a series of swaps of nonconflicting instructions, we say that S and S are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule
Example of a schedule that is not conflict serializable:

T3

T4

read(Q)
write(Q)
write(Q)
We are unable to swap instructions in the above schedule to obtain either the serial
schedule <T3, T4>, or the serial schedule <T4, T3>.
Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2
follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is
conflict serializable.

Mr. Y SUBBA RAYUDU M. Tech

Page 80

DBMS

Schedule 3

View Serializability
Let S and S be two schedules with the same set of transactions. S and S are view
equivalentif the following three conditions are met:
1 .For each data item Q, if transaction Tireads the initial value of Q in schedule S, then
transaction Ti must, in schedule S, also read the initial value of Q.
2.For each data item Q if transaction Tiexecutes read(Q) in schedule S, and that value
was produced by transaction Tj(if any), then transaction Ti must in schedule S also read
the value of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the finalwrite(Q) operation in schedule S.
As can be seen, view equivalence is also based purely on readsand writes alone.
A schedule S is view serializable it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable.
Schedule 9 a schedule which is view-serializable but not conflict serializable.

Schedule 9

Every view serializable schedule that is not conflict serializable has blind writes.

Other Notions of Serializability


Mr. Y SUBBA RAYUDU M. Tech

Page 81

DBMS
Schedule 8 given below produces same outcome as the serial schedule <T1,T5>, yet is
not conflict equivalent or view equivalent to it.

Schedule
8

Determining such equivalence requires analysis of operations other than read and write.

Recoverability
Recoverableschedule if a transaction Tj reads a data items previously written by a
transaction Ti , the commit operation of Ti appears before the commit operation of Tj.
The following schedule is not recoverable if T9commits immediately after the read

Schedule 11

If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
database state. Hence database must ensure that schedules are recoverable.
Cascading rollback a single transaction failure leads to a series of transaction
rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)

Mr. Y SUBBA RAYUDU M. Tech

Page 82

DBMS

If T10 fails, T11 and T12 must also be rolled back.


Can lead to the undoing of a significant amount of work
Cascadelessschedules cascading rollbacks cannot occur; for each pair of
transactions Tiand Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are cascadeless

Implementation of Isolation
Schedules must be conflict or view serializable, and recoverable, for the sake of
database consistency, and preferably cascadeless.
A policy in which only one transaction can execute at a time generates serial schedules,
but provides a poor degree of concurrency..
Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur.
Some schemes allow only conflict-serializable schedules to be generated, while others
allow view-serializable schedules that are not conflict-serializable.

Transaction Definition in SQL


Data manipulation language must include a construct for specifying the set of actions
that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Mr. Y SUBBA RAYUDU M. Tech

Page 83

DBMS
Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
Serializable default
Repeatable read
Read committed
Read uncommitted

Levels of Consistency in SQL-92


Serializable default
Repeatable read only committed records to be read, repeated reads of same record
must return same value. However, a transaction may not be serializable it may find
some records inserted by a transaction but not find others.
Read committed only committed records can be read, but successive reads of record
may return different (but committed) values.
Read uncommitted even uncommitted records may be read.

Testing for Serializability


Consider some schedule of a set of transactions T1, T2, ..., Tn
Precedence graph a direct graph where the vertices are the transactions (names).
We draw an arc from Tito Tjif the two transaction conflict, and Tiaccessed the data item
on which the conflict arose earlier.
We may label the arc by the item that was accessed.

Example 1

Mr. Y SUBBA RAYUDU M. Tech

Page 84

DBMS

Example Schedule (Schedule A)


T1

T2

T3

T4

T5

read(X)
read(Y)
read(Z)
read(V)
read(W)
read(W)
read(Y)
write(Y)
write(Z)
read(U)
read(Y)
write(Y)
read(Z)
write(Z)
read(U)
write(U)

Precedence Graph for Schedule A

Test for Conflict Serializability


A schedule is conflict serializable if and only if its precedence graph is acyclic.

Mr. Y SUBBA RAYUDU M. Tech

Page 85

DBMS
Cycle-detection algorithms exist which take order n2 time, where n is the number of
vertices in the graph. (Better algorithms take order n + e where e is the number of edges.)
If precedence graph is acyclic, the serializability order can be obtained by a topological
sorting of the graph. This is a linear order consistent with the partial order of the graph.
For

example,

serializability

order

for

Schedule

would

be

T5T1T3T2T4 .

Test for View Serializability


The precedence graph test for conflict serializability must be modified to apply to a test for
view serializability.
The problem of checking if a schedule is view serializable falls in the class of NP-complete
problems.

Thus

existence

of

an

efficient

algorithm

is

unlikely.

However practical algorithms that just check some sufficient conditions for view
serializability can still be used.

Concurrency Control vs. Serializability Tests


Testing a schedule for serializabilityafter it has executed is a little too late!
Goal to develop concurrency control protocols that will assure serializability. They will
generally not examine the precedence graph as it is being created; instead a protocol will
impose a discipline that avoids nonseralizable schedules.
Tests for serializability help understand why a concurrency control protocol is correct.

Mr. Y SUBBA RAYUDU M. Tech

Page 86

DBMS

Mr. Y SUBBA RAYUDU M. Tech

Page 87

DBMS
Example of a transaction performing locking:
T2: lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee serializability if A and B get updated
in-between the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and
releasing locks. Locking protocols restrict the set of possible schedules.

Pitfalls of Lock-Based Protocols


Consider the partial schedule

Neither T3 nor T4 can make progress executing lock-S(B) causes T4 to wait for T3 to
release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its
lock on A.
Such a situation is called a deadlock.
To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary
evil.
Starvation is also possible if concurrency control manager is badly designed. For
example:
Mr. Y SUBBA RAYUDU M. Tech

Page 88

DBMS
A transaction may be waiting for an X-lock on an item, while a sequence of
other transactions request and are granted an S-lock on the same item.
The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol


This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase
transaction may obtain locks
transaction may not release locks
Phase 2: Shrinking Phase
transaction may release locks
transaction may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be
serialized in the order of their lock points (i.e. the point where a transaction acquired its
final lock).
Two-phase locking does not ensure freedom from deadlocks.
Cascading roll-back is possible under two-phase locking. To avoid this, follow a
modified protocol called strict two-phase locking. Here a transaction must hold all its
exclusive locks till it commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort.
In this protocol transactions can be serialized in the order in which they commit.
There can be conflict serializable schedules that cannot be obtained if two-phase
locking is used.
However, in the absence of extra information (e.g., ordering of access to data), twophase locking is needed for conflict serializability in the following sense:
Mr. Y SUBBA RAYUDU M. Tech

Page 89

DBMS
Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj
that uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable.

Lock Conversions
Two-phase locking with lock conversions:
First Phase:
can acquire a lock-S on item
can acquire a lock-X on item
can convert a lock-S to a lock-X (upgrade)
Second Phase:
can release a lock-S
can release a lock-X
can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.

Automatic Acquisition of Locks


A transaction Ti issues the standard read/write instruction, without explicit locking calls.
The operation read(D) is processed as:
ifTi has a lock on D
then
read(D)
else
begin
if necessary wait until no other
transaction has a lock-X on D
grantTi a lock-S on D;
read(D)
end
write(D) is processed as:
Mr. Y SUBBA RAYUDU M. Tech

Page 90

DBMS
ifTi has a lock-X on D
then
write(D)
else
begin
if necessary wait until no other trans. has any lock on D,
ifTi has a lock-S on D
then
upgrade lock on D to lock-X
else
grantTi a lock-X on D
write(D)
end;
All locks are released after commit or abort

Implementation of Locking
A Lock manager can be implemented as a separate process to which transactions send
lock and unlock requests
The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is answered
The lock manager maintains a datastructure called a lock table to record granted locks
and pending requests
The lock table is usually implemented as an in-memory hash table indexed on the name
of the data item being locked

Lock Table

Black rectangles indicate granted locks, white ones indicate waiting requests
Mr. Y SUBBA RAYUDU M. Tech

Page 91

DBMS
Lock table also records the type of lock granted or requested
New request is added to the end of the queue of requests for the data item, and granted
if it is compatible with all earlier locks
Unlock requests result in the request being deleted, and later requests are checked to see
if they can now be granted
If transaction aborts, all waiting or granted requests of the transaction are deleted
lock manager may keep a list of locks held by each transaction, to implement
this efficiently

Graph-Based Protocols
Graph-based protocols are an alternative to two-phase locking
Impose a partial ordering on the set D = {d1, d2 ,..., dh} of all data items.
If didj then any transaction accessing both di and dj must access di before
accessing dj.
Implies that the set D may now be viewed as a directed acyclic graph, called a
database graph.
The tree-protocol is a simple kind of graph protocol.
Tree Protocol
Only exclusive locks are allowed.
The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by
Ti only if the parent of Q is currently locked by Ti.
Data items may be unlocked at any time.
The tree protocol ensures conflict serializability as well as freedom from deadlock.
Unlocking may occur earlier in the tree-locking protocol than in the two-phase locking
protocol.
shorter waiting times, and increase in concurrency
Mr. Y SUBBA RAYUDU M. Tech

Page 92

DBMS
protocol is deadlock-free, no rollbacks are required
the abort of a transaction can still lead to cascading rollbacks.

However, in the tree-locking protocol, a transaction may have to lock data items that it
does not access.
increased locking overhead, and additional waiting time
potential decrease in concurrency
Schedules not possible under two-phase locking are possible under tree protocol, and
vice versa.
Timestamp-Based Protocols:
Each transaction is issued a timestamp when it enters the system. If an old transaction
Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that
TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps determine the
serializability order.
In order to assure such behavior, the protocol maintains for each data Q two timestamp
values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed
write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully.
The timestamp ordering protocol ensures that any conflicting
operations are executed in timestamp order.
Suppose a transaction Ti issues a read(Q)
1. If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q
that was already overwritten. Hence, the read operation is
Mr. Y SUBBA RAYUDU M. Tech

Page 93

read and write

DBMS
rejected, and Ti is rolled back.
2. If TS(Ti)W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to the maximum of R- timestamp(Q) and TS(Ti).
Suppose that transaction Ti issues write(Q).
If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced. Hence,
the write operation is rejected, and Ti is rolled back.
If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
Hence, this write operation is rejected, and Ti is rolled back.
Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).
Example Use of the Protocol: A partial schedule for several data items for transactions
with timestamps
1, 2, 3, 4, 5
T1

T2

read(Y)

T3

T4

T5
read(X)

read(Y)
write(Y)
write(Z)

read(Z)
read(X)

read(X)
abort
write(Z)
abort

write(Y)
write(Z)

Correctness of Timestamp-Ordering Protocol


The timestamp-ordering protocol guarantees serializability since all the arcs in the
precedence graph are of the form:

transaction

transaction

with smaller

Mr. Y SUBBA RAYUDU M. Tech

Page 94

with larger

DBMS
Thus, there will be no cycles in the precedence graph
Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
But the schedule may not be cascade-free, and may not even be recoverable

Recoverability and Cascade Freedom


Problem with timestamp-ordering protocol:
Suppose Ti aborts, but Tj has read a data item written by Ti
Then Tjmust abort; if Tjhad been allowed to commit earlier, the schedule is not
recoverable.
Further, any transaction that has read a data item written by Tj must abort
This can lead to cascading rollback --- that is, a chain of rollbacks
Solution:
A transaction is structured such that its writes are all performed at the end of its
processing
All writes of a transaction form an atomic action; no transaction may execute
while a transaction is being written
A transaction that aborts is restarted with a new timestamp

Thomas Write Rule


Modified version of the timestamp-ordering protocol in which obsolete

write

operations may be ignored under certain circumstances.


When Ti attempts to write data item Q, if TS(Ti) < W-timestamp(Q), then Ti is
attempting to write an obsolete value of {Q}. Hence, rather than rolling back Ti as the
timestamp ordering protocol would have done, this {write} operation can be ignored.
Otherwise this protocol is the same as the timestamp ordering protocol.
Thomas' Write Rule allows greater potential concurrency. Unlike previous protocols, it
allows some view-serializable schedules that are not conflict-serializable.

Validation-Based Protocol
Mr. Y SUBBA RAYUDU M. Tech

Page 95

DBMS
Execution of transaction Tiis done in three phases.
1. Read and execution phase: Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti performs a ``validation test''
to determine if local variables can be written without violating
serializability.
3. Write phase: If Ti is validated, the updates are applied to the
database; otherwise, Ti is rolled back.
The three phases of concurrently executing transactions can be

interleaved, but each

transaction must go through the three phases in that order.


Also called as optimistic concurrency control since transaction executes fully in the
hope that all will go well during validation
Each transaction Ti has 3 timestamps
Start(Ti) : the time when Ti started its execution
Validation(Ti): the time when Ti entered its validation phase

Finish(Ti) : the time when Ti finished its write phase

Serializability order is determined by timestamp given at validation time, to increase


concurrency. Thus TS(Ti) is given the value of Validation(Ti).
This protocol is useful and gives greater degree of concurrency if probability of
conflicts is low. That is because the serializability order is not pre-decided and
relatively less transactions will have to be rolled back.
Validation Test for Transaction Tj
If for all Ti with TS (Ti) < TS (Tj) either one of the following condition holds:
finish(Ti) <start(Tj)
start(Tj) <finish(Ti) <validation(Tj) and the set of data items written by Ti does
not intersect with the set of data items read by Tj.
then validation succeeds and Tj can be committed. Otherwise, validation fails and Tj is aborted.
Mr. Y SUBBA RAYUDU M. Tech

Page 96

DBMS
Justification: Either first condition is satisfied, and there is no overlapped execution, or
second condition is satisfied and
1. the writes of Tjdo not affect reads of Ti since they occur after Ti
has finished its reads.
2. the writes of Ti do not affect reads of Tj since Tjdoes not read
any item written by Ti.
Schedule Produced by Validation
Example of schedule produced using validation

T1

T2

read(B)
read(B)
B:- B-50
read(A)
A:- A+50
read(A)
(validate)
display (A+B)

Multiple Granularity

(validate)
write (B)
write (A)

Allow data items to be of various sizes and define a hierarchy of data granularities,
where the small granularities are nested within larger ones
Can be represented graphically as a tree (but don't confuse with tree-locking protocol)
When a transaction locks a node in the tree explicitly, it implicitly locks all the node's
descendents in the same mode.
Granularity of locking (level in tree where locking is done):
fine granularity (lower in tree): high concurrency, high locking overhead
coarse granularity (higher in tree): low locking overhead, low concurrency

Mr. Y SUBBA RAYUDU M. Tech

Page 97

DBMS

Example of Granularity Hierarchy

The highest level in the example hierarchy is the entire database.


The levels below are of type area, file and record in that order.

Intention Lock Modes


In addition to S and X lock modes, there are three additional lock modes with
multiple granularityintention-shared (IS): indicates explicit locking at a lower
level of the tree but only with shared locks.
intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
shared and intention-exclusive (SIX): the subtree rooted by that node is locked
explicitly in shared mode and explicit locking is being done at a lower level
with exclusive-mode locks.
intention locks allow a higher level node to be locked in S or X mode without having to
check all descendent nodes.
Compatibility Matrix with Intention Lock Modes
The compatibility matrix for all lock modes is:
IS

IX

S IX

IS
IX
S
S Mr.
IX Y SUBBA RAYUDU M. Tech
X

Page 98

DBMS

Mr. Y SUBBA RAYUDU M. Tech

Page 99

DBMS
Multiple Granularity Locking Scheme
Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in
any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent
ofQ is currently locked by Ti in either IX or IS
mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the
parent of Q is currently locked by Ti in either IX
or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node
(that is, Tiis two-phase).
6. Tican unlock a node Q only if none of the children of Q are
currently locked by Ti.
Observe

that

locks

are

acquired

whereas they are released in leaf-to-root order.

Recovery System
Failure Classification
Storage Structure
Recovery and Atomicity
Log-Based Recovery
Shadow Paging
Recovery With Concurrent Transactions
Buffer Management
Failure with Loss of Nonvolatile Storage
Advanced Recovery Techniques

Mr. Y SUBBA RAYUDU M. Tech

Page 100

in

root-to-leaf

order,

DBMS
ARIES Recovery Algorithm
Remote Backup Systems

Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to some internal error condition
System errors: the database system must terminate an active transaction due to
an error condition (e.g., deadlock)
System crash: a power failure or other hardware or software failure causes the system
to crash.
Fail-stop assumption: non-volatile storage contents are assumed to not be
corrupted by system crash
Database systems have numerous integrity checks to prevent corruption
of disk data
Disk failure: a head crash or similar disk failure destroys all or part of disk storage
Destruction is assumed to be detectable: disk drives use checksums to detect
failures
Recovery Algorithms
Recovery algorithms are techniques to ensure database consistency and transaction
atomicity and durability despite failures
Recovery algorithms have two parts
Actions taken during normal transaction processing to ensure enough information
exists to recover from failures
Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability

Storage Structure
Volatile storage:
Mr. Y SUBBA RAYUDU M. Tech

Page 101

DBMS
does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
survives system crashes
examples:

disk,

tape,

flash

memory,

non-volatile (battery backed up) RAM


Stable storage:
a mythical form of storage that survives all failures
approximated by maintaining multiple copies on distinct nonvolatile media
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
copies can be at remote sites to protect against disasters such as fire or flooding.
Failure during data transfer can still result in inconsistent copies: Block transfer can
result in
Successful completion
Partial failure: destination block has incorrect information
Total failure: destination block was never updated
Protecting storage media from failure during data transfer (one solution):
Execute output operation as follows (assuming two copies of each block):
Write the information onto the first physical block.
When the first write successfully completes, write the same information
onto the second physical block.
The output is completed only after the second write successfully
completes.
Mr. Y SUBBA RAYUDU M. Tech

Page 102

DBMS
Copies of a block may differ due to failure during output operation. To recover from
failure:
First find inconsistent blocks:
Expensive solution: Compare the two copies of every disk block.
Better solution:
copies can be at remote sites to protect against disasters such as fire or flooding.
Failure during data transfer can still result in inconsistent copies: Block transfer can
result in
Successful completion
Partial failure: destination block has incorrect information
Total failure: destination block was never updated
Protecting storage media from failure during data transfer (one solution):
Execute output operation as follows (assuming two copies of each block):
Write the information onto the first physical block.
When the first write successfully completes, write the same information
onto the second physical block.
The output is completed only after the second write successfully
completes.
Copies of a block may differ due to failure during output operation. To recover from
failure:
First find inconsistent blocks:
Expensive solution: Compare the two copies of every disk block.
Better solution:

Mr. Y SUBBA RAYUDU M. Tech

Page 103

DBMS
Record in-progress disk writes on non-volatile storage (Non-volatile
RAM or special area of disk).
Use this information during recovery to find blocks that may be
inconsistent, and only compare copies of these.
Used in hardware RAID systems
If either copy of an inconsistent block is detected to have an error (bad
checksum), overwrite it by the other copy. If both have no error, but are
different, overwrite the second block by the first block.
Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main memory.
Block movements between disk and main memory are initiated through the following
two operations:
input(B) transfers the physical block B to main memory.
output(B) transfers the buffer block B to the disk, and replaces the appropriate
physical block there.
Each transaction Tihas its private work-area in which local copies of all data items
accessed and updated by it are kept.
Ti's local copy of a data item X is called xi.
We assume, for simplicity, that each data item fits in, and is stored inside, a single
block.
Transaction transfers data items between system buffer blocks and its private work-area
using the following operations :
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xito data item {X} in the buffer
block.

Mr. Y SUBBA RAYUDU M. Tech

Page 104

DBMS
both these commands may necessitate the issue of an input(BX) instruction
before the assignment, if the block BX in which X resides is not already in
memory.
Transactions
Perform read(X) while accessing X for the first time;
All subsequent accesses are to the local copy.
After last access, transaction executes write(X).
output(BX) need not immediately follow write(X). System can perform the output
operation when it deems fit.

Recovery and Atomicity


Modifying the database without ensuring that the transaction will commit may leave
the database in an inconsistent state.
Consider transaction Ti that transfers $50 from account A to account B; goal is either to
perform all database modifications made by Tior none at all.
Several output operations may be required for Ti (to output A and B). A failure may
occur after one of these modifications have been made but before all of them are made.
To ensure atomicity despite failures, we first output information describing the
modifications to stable storage without modifying the database itself.
We study two approaches:
Mr. Y SUBBA RAYUDU M. Tech

Page 105

DBMS
log-based recovery, and
shadow-paging
We assume (initially) that transactions run serially, that is, one after the other.

Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update activities
on the database.
When

transaction

Tistarts,

it

registers

itself

by

writing

<Ti start>log record


Before Tiexecutes write(X), a log record <Ti, X, V1, V2>is written, where V1 is the
value of X before the write, and V2is the value to be written to X.
Log record notes that Ti has performed a write on data item XjXjhad value
V1before the write, and will have value V2after the write.
When Ti finishes it last statement, the log record <Ticommit> is written.
We assume for now that log records are written directly to stable storage (that is, they
are not buffered)
Two approaches using logs
Deferred database modification
Immediate database modification
Deferred Database Modification
The deferred database modification scheme records all modifications to the log, but
defers all the writes to after partial commit.
Assume that transactions execute serially
Transaction starts by writing <Tistart>record to log.

Mr. Y SUBBA RAYUDU M. Tech

Page 106

DBMS
A write(X) operation results in a log record <Ti, X, V>being written, where V is the
new value for X
Note: old value is not needed for this scheme
The write is not performed on X at this time, but is deferred.
When Tipartially commits, <Ticommit> is written to the log
Finally, the log records are read and used to actually execute the previously deferred
writes.
During recovery after a crash, a transaction needs to be redone if and only if both
<Tistart> and<Ti commit> are there in the log.
Redoing a transaction Ti(redoTi) sets the value of all data items updated by the
transaction to the new values.
Crashes can occur while
the transaction is executing the original updates, or
while recovery action is being taken
example transactions T0and T1(T0executes before T1):
T0: read (A)

T1:read (C)

A: - A - 50

C:-C- 100

Write (A)

write (C)

read(B)
B:- B + 50
write(B)
Below we show the log as it appears at three instances of time.

Mr. Y SUBBA RAYUDU M. Tech

Page 107

DBMS

If log on stable storage at time of crash is as in case:


(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0commit> and <Ti commit> are present
Immediate Database Modification
The immediate database modification scheme allows database updates of an
uncommitted transaction to be made as the writes are issued
since undoing may be needed, update logs must have both old value and new
value
Update log record must be written before database item is written
We assume that the log record is output directly to stable storage
Can be extended to postpone log record output, so long as prior to execution of
an output(B) operation for a data block B, all log records corresponding to
items B must be flushed to stable storage
Output of updated blocks can take place at any time before or after transaction commit
Order in which blocks are output can be different from the order in which they are
written.
Immediate Database Modification Example
Log

Write

Output

<T0start>
<T0, A, 1000, 950>
Mr. Y SUBBA RAYUDU M. Tech

Page 108

DBMS
To, B, 2000, 2050
A = 950
B = 2050
<T0commit>
<T1start>
<T1, C, 700, 600>
C = 600
BB, BC
<T1commit>
BA
Note:BXdenotes block containing X.
Recovery procedure has two operations instead of one:
undo(Ti) restores the value of all data items updated by Ti to their old values,
going backwards from the last log record for Ti
redo(Ti) sets the value of all data items updated by Tito the new values, going
forward from the first log record for Ti
Both operations must be idempotent
That is, even if the operation is executed multiple times the effect is the same as
if it is executed once
Needed since operations may get re-executed during recovery
When recovering after failure:
Transaction Tineeds to be undone if the log contains the record
<Tistart>, but does not contain the record <Ticommit>.
Transaction Tineeds to be redone if the log contains both the record <Tistart>
and the record <Ti commit>.
Undo operations are performed first, then redo operations.
Immediate DB Modification Recovery Example
Mr. Y SUBBA RAYUDU M. Tech

Page 109

DBMS
Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Checkpoints
Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which have already
3. output their updates to the database.
Streamline recovery procedure by periodically performing checkpointing
1. Output all log records currently residing in main memory onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable storage.
During recovery we need to consider only the most recent transaction T i that started
before the checkpoint, and transactions that started after Ti.
1. Scan backwards from end of log to find the most recent <checkpoint> record
2. Continue scanning backwards till a record <Ti start> is found.

Mr. Y SUBBA RAYUDU M. Tech

Page 110

DBMS
3. Need only consider the part of log following above start record. Earlier part of
log can be ignored during recovery, and can be erased whenever desired.
4. For all transactions (starting from Ti or later) with no <Ticommit>, execute
undo(Ti). (Done only in case of immediate modification.)
5. Scanning forward in the log, for all transactions starting

from Tior later with

a <Ticommit>, execute redo(Ti).

Example of Checkpoints

T1 can be ignored (updates already output to disk due to checkpoint)


T2 and T3 redone.
T4 undone

Mr. Y SUBBA RAYUDU M. Tech

Page 111

DBMS
Shadow Paging
Shadow paging is an alternative to log-based recovery; this scheme is useful if
transactions execute serially
Idea: maintain two page tables during the lifetime of a transaction the current page
table, and the shadow page table
Store the shadow page table in nonvolatile storage, such that state of the database
prior to transaction execution may be recovered.
1. Shadow page table is never modified during execution
To start with, both the page tables are identical. Only current page table is used for
data item accesses during execution of the transaction.
Whenever any page is about to be written for the first time
1. A copy of this page is made onto an unused page.
2. The current page table is then made to point to the copy
3. The update is performed on the copy
Sample Page Table

Example of Shadow Paging

Shadow and current page tables after write to page 4

Mr. Y SUBBA RAYUDU M. Tech

Page 112

DBMS

To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
keep a pointer to the shadow page table at a fixed (known) location on disk.
to make the current page table the new shadow page table, simply update the
pointer to point to current page table on disk
Once pointer to shadow page table has been written, transaction is committed.
No recovery is needed after a crash new transactions can start right away, using the
shadow page table.
Pages not pointed to from current/shadow page table should be freed (garbage
collected).
Advantages of shadow-paging over log-based schemes
no overhead of writing log records
recovery is trivial
Disadvantages :
Copying the entire page table is very expensive
Mr. Y SUBBA RAYUDU M. Tech

Page 113

DBMS
Can be reduced by using a page table structured like a B+-tree
No need to copy entire tree, only need to copy paths in the tree
that lead to updated leaf nodes
Commit overhead is high even with above extension
Need to flush every updated page, and page table
Data gets fragmented (related pages get separated on disk)
After every transaction completion, the database pages containing old versions
of modified data need to be garbage collected
Hard to extend algorithm to allow transactions to run concurrently
Easier to extend log based schemes

Recovery With Concurrent Transactions


We modify the log-based recovery schemes to allow multiple transactions to execute
concurrently.
All transactions share a single disk buffer and a single log
A buffer block can have data items updated by one or more transactions
We assume concurrency control using strict two-phase locking;
i.e. the updates of uncommitted transactions should not be visible to other
transactions
Otherwise how to perform undo if T1 updates A, then T2 updates A and
commits, and finally T1 has to abort?
Logging is done as described earlier.
Log records of different transactions may be interspersed in the log.
The checkpointing technique and actions taken on recovery have to be changed
Mr. Y SUBBA RAYUDU M. Tech

Page 114

DBMS
since several transactions may be active when a checkpoint is performed.
Checkpoints are performed as before, except that the checkpoint log record is now of
the

form
<

checkpointL>

where L is the list of transactions active at the time of the checkpoint


We assume no updates are in progress while the checkpoint is carried out (will
relax this later)
When the system recovers from a crash, it first does the following:
Initialize undo-list and redo-list to empty
Scan the log backwards from the end, stopping when the first <checkpointL>
record

is

found.

For each record found during the backward scan:


if the record is <Ticommit>, add Tito redo-list
if the record is <Ti start>, then if Ti is not in redo-list, add Ti to undo-list
For every Ti in L, if Ti is not in redo-list, add Tito undo-list
At this point undo-list consists of incomplete transactions which must be undone, and
redo-list consists of finished transactions that must be redone.
Recovery now continues as follows:
Scan

log

backwards

from

most

recent

record,

stopping

when

<Tistart> records have been encountered for every Ti in undo-list.


During the scan, perform undo for each log record that belongs to a
transaction in undo-list.
Locate the most recent <checkpoint L> record.
Scan log forwards from the <checkpoint L>record till the end of the log.

Mr. Y SUBBA RAYUDU M. Tech

Page 115

DBMS
During the scan, perform redo for each log record that belongs to a
transaction on redo-list
Example of Recovery
Go over the steps of the recovery algorithm on the following log:
<T0start>
<T0, A, 0, 10>
<T0commit>
<T1start>
<T1, B, 0, 10>
<T2start>

/* Scan in Step 4 stops here */

<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3commit>
Log Record Buffering
Log record buffering: log records are buffered in main memory, instead of of being
output directly to stable storage.
Log records are output to stable storage when a block of log records in the
buffer is full, or a log force operation is executed.
Log force is performed to commit a transaction by forcing all its log records (including
the commit record) to stable storage.
Several log records can thus be output using a single output operation, reducing the I/O
cost.
The rules below must be followed if log records are buffered:
Log records are output to stable storage in the order in which they are created.

Mr. Y SUBBA RAYUDU M. Tech

Page 116

DBMS
Transaction Ti enters the commit state only when the log record
<Ticommit> has been output to stable storage.
Before a block of data in main memory is output to the database, all log records
pertaining to data in that block must have been output to stable storage.
This rule is called the write-ahead logging or WAL rule
Strictly speaking WAL only requires undo information to be
output

Database Buffering
Database maintains an in-memory buffer of data blocks
When a new block is needed, if buffer is full an existing block needs to be
removed from buffer
If the block chosen for removal has been updated, it must be output to disk
As a result of the write-ahead logging rule, if a block with uncommitted updates is
output to disk, log records with undo information for the updates are output to the log
on stable storage first.
No updates should be in progress on a block when it is output to disk. Can be ensured
as follows.
Before writing a data item, transaction acquires exclusive lock on block
containing the data item
Lock can be released once the write is completed.
Such locks held for short duration are called latches.
Before a block is output to disk, the system acquires an exclusive latch on the
block
Ensures no update can be in progress on the block

Buffer Management
Database buffer can be implemented either
Mr. Y SUBBA RAYUDU M. Tech

Page 117

DBMS
in an area of real main-memory reserved for the database, or
in virtual memory
Implementing buffer in reserved main-memory has drawbacks:
Memory is partitioned before-hand between database buffer and applications,
limiting flexibility.
Needs may change, and although operating system knows best how memory
should be divided up at any time, it cannot change the partitioning of memory.
Database buffers are generally implemented in virtual memory in spite of some
drawbacks:
When operating system needs to evict a page that has been modified, to make
space for another page, the page is written to swap space on disk.
When database decides to write buffer page to disk, buffer page may be in swap
space, and may have to be read from swap space on disk and output to the
database on disk, resulting in extra I/O!
Known as dual paging problem.
Ideally when swapping out a database buffer page, operating system should pass
control to database, which in turn outputs page to database instead of to swap
space (making sure to output log records first)
Dual paging can thus be avoided, but common operating systems do not
support such functionality.

Failure with Loss of Nonvolatile Storage


So far we assumed no loss of non-volatile storage
Technique similar to checkpointing used to deal with loss of non-volatile storage
Periodically dump the entire content of the database to stable storage
No transaction may be active during the dump procedure; a procedure similar to
checkpointing must take place
Mr. Y SUBBA RAYUDU M. Tech

Page 118

DBMS
Output all log records currently residing in main memory onto stable
storage.
Output all buffer blocks onto the disk.
Copy the contents of the database to stable storage.
Output a record <dump> to log on stable storage.
To recover from disk failure
restore database from most recent dump.
Consult the log and redo all transactions that committed after the dump
Can

be

extended

to

allow

transactions

to

be

active

during

dump;

known as fuzzy dump or online dump


Will study fuzzy checkpointing later

Advanced Recovery Techniques


Support high-concurrency locking techniques, such as those used for B +-tree
concurrency control
Operations like B+-tree insertions and deletions release locks early.
They cannot be undone by restoring old values (physical undo), since once a
lock is released, other transactions may have updated the B+-tree.
Instead, insertions (resp. deletions) are undone by executing a deletion (resp.
insertion) operation (known as logical undo).
For such operations, undo log records should contain the undo operation to be executed
calledlogical undo logging, in contrast to physical undo logging.
Redo information is logged physically (that is, new value for each write) even for such
operations
Logical redo is very complicated since database state on disk may not be
operation consistent
Mr. Y SUBBA RAYUDU M. Tech

Page 119

DBMS
Operation logging is done as follows:
When operation starts, log <Ti, Oj, operation-begin>. HereOj is a unique
identifier of the operation instance.
While operation is executing, normal log records with physical redo and
physical undo information are logged.
When operation completes, <Ti, Oj, operation-end, U> is logged, where U
contains information needed to perform a logical undo information.
If crash/rollback occurs before operation completes:
the operation-end log record is not found, and
the physical undo information is used to undo operation.
If crash/rollback occurs after the operation completes:
the operation-end log record is found, and in this case
logical undo is performed using U; the physical undo information for the
operation is ignored.
Redo of operation (after crash) still uses physical redo information.
Rollback of transaction Tiis done as follows:

Scan the log backwards


1. If a log record <Ti, X, V1, V2> is found, perform the undo and log a special
redo-only log record<Ti, X, V1>.
2. If a <Ti, Oj, operation-end, U> record is found
Rollback the operation logically using the undo information U.
Updates performed during roll back are logged just like
during normal operation execution.
At the end of the operation rollback, instead of logging an
operation-end record, generate a record
Mr. Y SUBBA RAYUDU M. Tech

Page 120

DBMS
<Ti, Oj, operation-abort>.
Skip all preceding log records for Ti until the record <Ti, Ojoperationbegin> is found
3 .If a redo-only record is found ignore it
4. If a <Ti, Oj,operation-abort> record is found:
skip

all

preceding

log

records

for

Ti

until

the

record

<Ti, Oj, operation-begin> is found.


5. Stop the scan when the record <Ti, start> is found
6. Add a <Ti, abort> record to the log
Some points to note:
Cases 3 and 4 above can occur only if the database crashes while a transaction is being
rolled back.
Skipping of log records as in case 4 is important to prevent multiple rollback of the
same operation.
The following actions are taken when recovering from system crash
1. Scan log forward from last <checkpoint L> record
Repeat history by physically redoing all updates of all transactions
1. Create an undo-list during the scan as follows
undo-list is set to L initially
Whenever <Tistart> is found Tiis added to undo-list
Whenever <Ticommit> or <Tiabort> is found, Ti is deleted from undolist
This brings database to state as of crash, with committed as well as uncommitted
transactions having been redone.
Now

undo-list contains transactions that are incomplete, that is, have neither

committed nor been fully rolled back.


2. Scan log backwards, performing undo on log records of transactions found in undo-list.

Transactions are rolled back as described earlier.


Mr. Y SUBBA RAYUDU M. Tech

Page 121

DBMS
When <Ti start> is found for a transaction Ti in undo-list, write a <Tiabort> log
record.
Stop scan when <Tistart> records have been found for all Ti in undo-list
This undoes the effects of incomplete transactions (those with neither commit nor
abort log records). Recovery is now complete.
Checkpointing is done as follows:
1. Output all log records in memory to stable storage
2. Output to disk all modified buffer blocks
3. Output to log on stable storage a < checkpoint L> record.
Transactions are not allowed to perform any actions while checkpointing is in progress.
Fuzzy checkpointing allows transactions to progress while the most time consuming
parts of checkpointing are in progress
Performed as described on next slide

Mr. Y SUBBA RAYUDU M. Tech

Page 122

DBMS
Fuzzy checkpointing is done as follows:
Temporarily stop all updates by transactions
Write a <checkpointL> log record and force log to stable storage
Note list M of modified buffer blocks
Now permit transactions to proceed with their actions
Output to disk all modified buffer blocks in list M
blocks should not be updated while being output
Follow WAL: all log records pertaining to a block must be output before
the block is output
Store a pointer to the checkpoint record in a fixed position last_checkpoint on
disk
When recovering using a fuzzy checkpoint, start scan from the checkpoint record
pointed to by last_checkpoint
Log records before last_checkpoint have their updates reflected in database on
disk, and need not be redone.
Incomplete checkpoints, where system had crashed while performing
checkpoint, are handled safely

ARIES Recovery Algorithm


ARIES is a state of the art recovery method
Incorporates numerous optimizations to reduce overheads during normal
processing and to speed up recovery
The advanced recovery algorithm we studied earlier is modeled after ARIES,
but greatly simplified by removing optimizations
Unlike the advanced recovery algorithm, ARIES
Uses log sequence number (LSN) to identify log records
Mr. Y SUBBA RAYUDU M. Tech

Page 123

DBMS
Stores LSNs in pages to identify what updates have already been
applied to a database page
Physiological redo
Dirty page table to avoid unnecessary redos during recovery
Fuzzy checkpointing that only records information about dirty pages, and does
not require dirty pages to be written out at checkpoint time
More coming up on each of the above
ARIES Optimizations
Physiological redo
Affected page is physically identified, action within page can be
logical
Used to reduce logging overheads

e.g. when a record is deleted and all other records have to be


moved to fill hole
o

Physiological redo can log just the record deletion

Physical redo would require logging of old and new


values for much of the page

Requires page to be output to disk atomically

Easy to achieve with hardware RAID, also supported by


some disk systems

Incomplete page output can be detected by checksum


techniques,
But extra actions are required for recovery
Treated as a media failure

Mr. Y SUBBA RAYUDU M. Tech

Page 124

DBMS
ARIES Data Structures
Log sequence number (LSN) identifies each log record
Must be sequentially increasing
Typically an offset from beginning of log file to allow fast access
Easily extended to handle multiple log files
Each page contains a PageLSN which is the LSN of the last log record whose effects
are reflected on the page
To update a page:
X-latch the pag, and write the log record
Update the page
Record the LSN of the log record in PageLSN
Unlock page
Page flush to disk S-latches page
Thus page state on disk is operation consistent
o

Required to support physiological redo

PageLSN is used during recovery to prevent repeated redo


Thus ensuring idempotence
Each log record contains LSN of previous log record of the same transaction

LSN in log record may be implicit


Special redo-only log record called compensation log record (CLR) used to log
actions taken during recovery that never need to be undone
Also serve the role of operation-abort log records used in advanced recovery
algorithm
Mr. Y SUBBA RAYUDU M. Tech

Page 125

DBMS
Have a field UndoNextLSN to note next (earlier) record to be undone
Records in between would have already been undone
Required to avoid repeated undo of already undone actions

DirtyPageTable
List of pages in the buffer that have been updated
Contains, for each such page
PageLSN of the page
RecLSNis an LSN such that log records before this LSN have
already been applied to the page version on disk
o

Set to current end of log when a page is inserted into dirty


page table (just before being updated)

Recorded in checkpoints, helps to minimize redo work

Checkpoint log record


Contains:
DirtyPageTable and list of active transactions
For each active transaction, LastLSN, the LSN of the last log
record written by the transaction
o

Fixed position on disk notes LSN of last completed


checkpoint log record

ARIES Recovery Algorithm


ARIES recovery involves three passes
Analysis pass: Determines
Which transactions to undo
Mr. Y SUBBA RAYUDU M. Tech

Page 126

DBMS
Which pages were dirty (disk version not up to date) at time of crash
RedoLSN: LSN from which redo should start
Redo pass:
Repeats history, redoing all actions from RedoLSN
RecLSN and PageLSNs are used to avoid redoing actions already
reflected on page
Undo pass:
Rolls back all incomplete transactions
Transactions whose abort was complete earlier are not undone
Key idea: no need to undo these transactions: earlier undo
actions were logged, and are redone as required
ARIES Recovery: Analysis
Analysis pass
Starts from last complete checkpoint log record
Reads in DirtyPageTable from log record
Sets RedoLSN = min of RecLSNs of all pages in DirtyPageTable
In case no pages are dirty, RedoLSN = checkpoint records LSN
Sets undo-list = list of transactions in checkpoint log record
Reads LSN of last log record for each transaction in undo-list from checkpoint
log record
Scans forward from checkpoint
If any log record found for transaction not in undo-list, adds transaction to
undo-list
Whenever an update log record is found
Mr. Y SUBBA RAYUDU M. Tech

Page 127

DBMS
If page is not in DirtyPageTable, it is added with RecLSN set to LSN of
the update log record
If transaction end log record found, delete transaction from undo-list
Keeps track of last log record for each transaction in undo-list
May be needed for later undo
At end of analysis pass:
RedoLSN determines where to start redo pass
RecLSN for each page in DirtyPageTable used to minimize redo work
All transactions in undo-list need to be rolled back
ARIES Redo Pass
Redo Pass: Repeats history by replaying every action not already reflected in the page on disk,
as follows:
Scans forward from RedoLSN. Whenever an update log record is found:
1. If the page is not in DirtyPageTable or the LSN of the log record is less than the
RecLSN of the page in DirtyPageTable, then skip the log record
2. Otherwise fetch the page from disk. If the PageLSN of the page fetched from
disk is less than the LSN of the log record, redo the log record
NOTE: if either test is negative the effects of the log record have already appeared on the page.
First test avoids even fetching the page from disk!
ARIES Undo Actions
When an undo is performed for an update log record
Generate a CLR containing the undo action performed (actions performed
during undo are logged physicaly or physiologically).
CLR for record n noted as n in figure below
Set UndoNextLSN of the CLR to the PrevLSN value of the update log record
Arrows indicate UndoNextLSN value
Mr. Y SUBBA RAYUDU M. Tech

Page 128

DBMS
ARIES supports partial rollback
Used e.g. to handle deadlocks by rolling back just enough to release reqd. locks
Figure indicates forward actions after partial rollbacks
records 3 and 4 initially, later 5 and 6, then full rollback

ARIES: Undo Pass


Undo pass
Performs backward scan on log undoing all transaction in undo-list
Backward scan optimized by skipping unneeded log records as follows:
Next LSN to be undone for each transaction set to LSN of last log record
for transaction found by analysis pass.
At each step pick largest of these LSNs to undo, skip back to it and undo
it
After undoing a log record
For ordinary log records, set next LSN to be undone for
transaction to PrevLSN noted in the log record
For compensation log records (CLRs) set next LSN to be undo to
UndoNextLSN noted in the log record
o

All intervening records are skipped since they would


have been undo already

Undos performed as described earlier


Other ARIES Features
Mr. Y SUBBA RAYUDU M. Tech

Page 129

DBMS
Recovery Independence
Pages can be recovered independently of others
E.g. if some disk pages fail they can be recovered from a backup while
other pages are being used
Savepoints:
Transactions can record savepoints and roll back to a savepoint
Useful for complex transactions
Also used to rollback just enough to release locks on deadlock
Fine-grained locking:
Index concurrency algorithms that permit tuple level locking on indices can be
used
These require logical undo, rather than physical undo, as in advanced
recovery algorithm
Recovery optimizations: For example:
Dirty page table can be used to prefetch pages during redo
Out of order redo is possible:
redo can be postponed on a page being fetched from disk, and
performed when page is fetched.
Meanwhile other log records can continue to be processed

Remote Backup Systems


Remote backup systems provide high availability by allowing transaction processing to
continue even if the primary site is destroyed.

Mr. Y SUBBA RAYUDU M. Tech

Page 130

DBMS

Detection of failure: Backup site must detect when primary site has failed
to distinguish primary site failure from link failure maintain several
communication links between the primary and the remote backup.
Transfer of control:
To take over control backup site first perform recovery using its copy of the
database and all the long records it has received from the primary.
Thus, completed transactions are redone and incomplete transactions are
rolled back.
When the backup site takes over processing it becomes the new primary
To transfer control back to old primary when it recovers, old primary must
receive redo logs from the old backup and apply all updates locally.
Time to recover: To reduce delay in takeover, backup site periodically proceses the
redo log records (in effect, performing recovery from previous database state), performs
a checkpoint, and can then delete earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
Backup continually processes redo log record as they arrive, applying the
updates locally.
When failure of the primary is detected the backup rolls back incomplete
transactions, and is ready to process new transactions.
Alternative to remote backup: distributed database with replicated data
Remote backup is faster and cheaper, but less tolerant to failure
Mr. Y SUBBA RAYUDU M. Tech

Page 131

DBMS
Ensure durability of updates by delaying transaction commit until update is logged at
backup; avoid this delay by permitting lower degrees of durability.
One-safe: commit as soon as transactions commit log record is written at primary
Problem: updates may not arrive at backup before it takes over.
Two-very-safe: commit when transactions commit log record is written at primary and
backup
Reduces availability since transactions cannot commit if either site fails.
Two-safe: proceed as in two-very-safe if both primary and backup are active. If only
the primary is active, the transaction commits as soon as is commit log record is written
at the primary.
Better availability than two-very-safe; avoids problem of lost transactions in
one-safe.

Mr. Y SUBBA RAYUDU M. Tech

Page 132

You might also like