Unit 1 Notes

UNIT I
INTRODUCTION TO DBMS - Syllabus

File Systems Organization - Sequential, Pointer, Indexed, Direct - Purpose
of Database System- Database System Terminologies-Database
characteristics- Data models Types of data models Components of
DBMS- Relational Algebra. LOGICAL DATABASE DESIGN: Relational DBMS Codd's Rule - Entity-Relationship model - Extended ER Normalization
Functional Dependencies, Anomaly- 1NF to 5NF- Domain Key Normal Form
Denormalization.
Databases
Definition: "a collection of related data"
represents some aspect of the real world (universe of discourse) generally

relevant to an enterprise/company/organization
logically coherent
organized to reflect relationships among the data
persistent
mirrors the state of the company/organization/enterprise, an asset in its own

right
usually a specific purpose and for a set of users when built -- but a good
design should allow for uses that are unanticipated.
Data: Any raw information that needs to be stored in the database for persistence is
termed as data.
Database Management System

A DBMS is a system of programs
Includes facilities to:
1. Define and modify the database structure

2. Construct the database on a storage medium
3. Manipulate the database: queries and updates
4. Maintain integrity with and security over the database
Meta-data (data about the database) is used to represent #1 and #2; the database administrator
supplies the meta-data
SQL is the most common language for #3.
Operations on the database are referred to as transactions.
The database administrator along with the DBMS itself covers #4.
Database Approaches / History

Flat files
separate files, each with a tabular organization
historically used punched cards and/or tapes
magnetic disks
today CSV files (comma separated values; text files; spreadsheet files); data
mining input;
Hierarchical (tree organization of data )
earliest approach of integrated data, IBM
today return to this approach with XML (eXtensible Markup Language, text
files)
Network (linked lists, directed graphs)
Efficient storage and retrieval
Complex design and navigation
Developed by CODASYL (Committe on Data Systems and Languages) which

brought us COBOL
today the network approach is found in object-oriented databases (OODB)
Relational database (primary approach today)
Tables (relations) of rows (tuples) and columns (attributes)
Tables and attributes are named
Relationships between tables are established by common values
Mathematically based on set theory
SQL is the workhorse query language and is often adapted for other
paradigms
Object oriented (OODB)
Embedded in Java or C++ (extension of OO)
Unifies object heap space in memory and secondary storage
Return to network approach
Modeling Design of databases

Attempt to model semantics of the database
entity-relationship (ER) modeling
extended ER (EER)
Unified Modeling Languae (UML)
Characteristics of modern database systems

Main Characteristics of database approach:
1. Self-Description: A database system includes in addition to the data stored that is of
relevance to the organization a complete definition/description of the database's
structure and constraints. This meta-data (i.e., data about data) is stored in the so-called
system catalog, which contains a description of the structure of each file, the type and
storage format of each field, and the various constraints on the data (i.e., conditions that
the data must satisfy).
The system catalog is used not only by users (e.g., who need to know the names of tables
and attributes, and sometimes data type information and other things), but also by the
DBMS software, which certainly needs to "know" how the data is structured/organized in
order to interpret it in a manner consistent with that structure. Recall that a DBMS is
general purpose, as opposed to being a specific database application. Hence, the structure
of the data cannot be "hard-coded" in its programs (such as is the case in typical file
processing approaches), but rather must be treated as a "parameter" in some sense.
2. Insulation between Programs and Data; Data Abstraction:
Program-Data Independence: In traditional file processing, the structure of the data
files accessed by an application is "hard-coded" in its source code. (E.g., Consider a file
descriptor in a COBOL program: it gives a detailed description of the layout of the
records in a file by describing, for each field, how many bytes it occupies.)
If, for some reason, we decide to change the structure of the data (e.g., by adding the first
two digits to the YEAR field, in order to make the program Y2K compliant!), every
application in which a description of that file's structure is hard-coded must be changed!
In contrast, DBMS access programs, in most cases, do not require such changes, because
the structure of the data is described (in the system catalog) separately from the programs
that access it and those programs consult the catalog in order to ascertain the structure of
the data (i.e., providing a means by which to determine boundaries between records and
between fields within records) so that they interpret that data properly.
In other words, the DBMS provides a conceptual or logical view of the data to
application programs, so that the underlying implementation may be changed without the
programs being modified. (This is referred to as program-data independence.)
Also, which access paths (e.g., indexes) exist are listed in the catalog, helping the DBMS
to determine the most efficient way to search for items in response to a query.
Note: In fairness to COBOL, it should be pointed out that it has a COPY feature that allows
different application programs to make use of the same file descriptor stored in a
"library". This provides some degree of program-data independence, but not nearly as
much as a good DBMS does. End of note.
Example by which to illustrate this concept: Suppose that you are given the task of
developing a program that displays the contents of a particular data file. Specifically,
each record should be displayed as follows:
Record #i:
value of first field
value of second field
...
...
value of last field
To keep things very simple, suppose that the file in question has fixed-length records of
57 bytes with six fixed-length fields of lengths 12, 4, 17, 2, 15, and 7 bytes, respectively,
all of which are ASCII strings. Developing such a program would not be difficult.
However, the obvious solution would be tailored specifically for a file having the
particular structure described here and would be of no use for a file with a different
structure.
Now suppose that the problem is generalized to say that the program you are to develop
must be able to display any file having fixed-length records with fixed-length fields that
are ASCII strings. Impossible, you say? Well, yes, unless the program has the ability to
access a description of the file's record layout (i.e., lengths of its records and the fields
therein), in which case the problem is not hard at all. This illustrates the power of
metadata, i.e., data describing other data.
3. Multiple Views of Data: Different users (e.g., in different departments of an
organization) have different "views" or perspectives on the database. For example, from
the point of view of a Bursar's Office employee, student data does not include anything
about which courses were taken or which grades were earned. (This is an example of a
subset view.)
As another example, a Registrar's Office employee might think that GPA is a field of data
in each student's record. In reality, the underlying database might calculate that value
each time it is needed. This is called virtual (or derived) data.
A view designed for an academic advisor might give the appearance that the data is
structured to point out the prerequisites of each course.
A good DBMS has facilities for defining multiple views. This is not only convenient for
users, but also addresses security issues of data access. (E.g., The Registrar's Office view
should not provide any means to access financial data.)
4. Data Sharing and Multi-user Transaction Processing: As you learned about (or will)
in the OS course, the simultaneous access of computer resources by multiple
users/processes is a major source of complexity. The same is true for multi-user DBMS's.
Arising from this is the need for concurrency control, which is supposed to ensure that
several users trying to update the same data do so in a "controlled" manner so that the
results of the updates are as though they were done in some sequential order (rather than
interleaved, which could result in data being incorrect).
This gives rise to the concept of a transaction, which is a process that makes one or more
accesses to a database and which must have the appearance of executing in isolation from
all other transactions (even ones that access the same data at the "same time") and of
being atomic (in the sense that, if the system crashes in the middle of its execution, the
database contents must be as though it did not execute at all).
Applications such as airline reservation systems are known as online transaction

processing applications.
Capabilities/Advantages of DBMS's
1. Controlling Redundancy: Data redundancy (such as tends to occur in the
"file processing" approach) leads to wasted storage space, duplication of
effort (when multiple copies of a datum need to be updated), and a higher
liklihood of the introduction of inconsistency.
On the other hand, redundancy can be used to improve performance of queries. Indexes,
for example, are entirely redundant, but help the DBMS in processing queries more
quickly.
Another example of using redundancy to improve performance is to store an "extra" field
in order to avoid the need to access other tables (as when doing a JOIN, for example).
See Figure 1.6 (page 18): the StudentName and CourseNumber fields need not be there.
A DBMS should provide the capability to automatically enforce the rule that no
inconsistencies are introduced when data is updated. (Figure 1.6 again, in which
Student_name does not match Student_number.)
2. Restricting Unauthorized Access: A DBMS should provide a security and
authorization subsystem, which is used for specifying restrictions on user
accounts. Common kinds of restrictions are to allow read-only access (no
updating), or access only to a subset of the data (e.g., recall the Bursar's and
Registrar's office examples from above).
3. Providing Persistent Storage for Program Objects: Object-oriented
database systems make it easier for complex runtime objects (e.g., lists,
trees) to be saved in secondary storage so as to survive beyond program
termination and to be retrievable at a later time.
4. Providing Storage Structures for Efficient Query Processing: The
DBMS maintains indexes (typically in the form of trees and/or hash tables)
that are utilized to improve the execution time of queries and updates. (The
choice of which indexes to create and maintain is part of physical database
design and tuning and is the responsibility of the DBA.
The query processing and optimization module is responsible for choosing an efficient
query execution plan for each query submitted to the system.
5. Providing Backup and Recovery: The subsystem having this responsibility

ensures that recovery is possible in the case of a system crash during
execution of one or more transactions.
6. Providing Multiple User Interfaces: For example, query languages for
casual users, programming language interfaces for application programmers,
forms and/or command codes for parametric users, menu-driven interfaces
for stand-alone users.
7. Representing Complex Relationships Among Data: A DBMS should have
the capability to represent such relationships and to retrieve related data
quickly.
8. Enforcing Integrity Constraints: Most database applications are such that
the semantics (i.e., meaning) of the data require that it satisfy certain
restrictions in order to make sense. Perhaps the most fundamental constraint
on a data item is its data type, which specifies the universe of values from
which its value may be drawn. (E.g., a Grade field could be defined to be of
type Grade_Type, which, say, we have defined as including precisely the
values in the set { "A", "A-", "B+", ..., "F" }.
Another kind of constraint is referential integrity, which says that if the database includes
an entity that refers to another one, the latter entity must exist in the database. For
example, if (R56547, CIL102) is a tuple in the Enrolled_In relation, indicating that a
student with ID R56547 is taking a course with ID CIL102, there must be tuples in the
Student and Course relations, respectively, that describe a student and a course with
those ID's.
9. Permitting Inferencing and Actions Via Rules: In a deductive database
system, one may specify declarative rules that allow the database to infer
new data! E.g., Figure out which students are on academic probation. Such
capabilities would take the place of application programs that would be used
to ascertain such information otherwise.
Active database systems go one step further by allowing "active rules" that
can be used to initiate actions automatically.
Database Users and their responsibilites

Database Administrators (DBA)
oversee design
manage resources and other users
authorization/security control to database
coordinating and monitoring its use
acquiring software resources and hardware resources as needed
the DBA is also accountable for problems such as breach of security or poor
system response time
Database Designers
specifies structure of data that will be stored in database
Identifying the data to be stored

Systems analysts -- specifies system using input from customer; provides complete description
of functionality from customers and users point of view
Applications programmers -- implements application programs (transactions) that access data

and support enterprise rules
Project managers
System administrator -- maintains transaction processing system: monitors interconnection of
HW and SW modules, deals with failures and congestion
End Users: These are persons who access the database for querying, updating, and report
generation. They are main reason for database's existence!
Casual end users: use database occasionally, needing different information each time;
use query language to specify their requests; typically middle- or high-level managers.
Naive/Parametric end users: Typically the biggest group of users; frequently

query/update the database using standard canned transactions that have been carefully
programmed and tested in advance. Examples:
o bank tellers check account balances, post withdrawals/deposits
o reservation clerks for airlines, hotels, etc., check availability of seats/rooms and
make reservations.
o shipping clerks (e.g., at UPS) who use buttons, bar code scanners, etc., to update
status of in-transit packages.
Sophisticated end users: engineers, scientists, business analysts who implement their
own applications to meet their complex needs.
Stand-alone users: Use "personal" databases, possibly employing a special-purpose

(e.g., financial) software package.
Workers Behind the Scene
DBMS system designers/implementors: provide the DBMS software that

is at the foundation of all this!
tool developers: design and implement software tools facilitating database

system design, performance monitoring, creation of graphical user interfaces,
prototyping, etc.
operators and maintenance personnel: responsible for the day-to-day

operation of the system.
Three Level Database Architecture

Data and Related Structures
Data are actually stored as bits, or numbers and strings, but it is difficult to work with data at this
level. It is necessary to view data at different levels of abstraction.
Schema:
Description of data at some level. Each level has its own schema.
We will be concerned with three forms of schemas:
physical,
conceptual, and
external.
Physical Data Level

The physical schema describes details of how data is stored: files, indices, etc. on the random
access disk system. It also typically describes the record layout of files and type of files (hash, btree, flat).
Early applications worked at this level - explicitly dealt with details. E.g., minimizing physical
distances between related data and organizing the data structures within the file (blocked records,
linked lists of blocks, etc.)
Problem:
Routines are hardcoded to deal with physical representation.
Changes to data structures are difficult to make.
Application code becomes complex since it must deal with details.
Rapid implementation of new features very difficult.
Conceptual Data Level

Also referred to as the Logical level. Hides details of the physical level.
In the relational model, the conceptual schema presents data as a set of

tables.
The DBMS maps data access between the conceptual to physical schemas automatically.
Physical schema can be changed without changing application:
DBMS must change mapping from conceptual to physical.
Referred to as physical data independence.
External Data Level

In the relational model, the external schema also presents data as a set of relations. An external
schema specifies a view of the data in terms of the conceptual level. It is tailored to the needs of
a particular category of users. Portions of stored data should not be seen by some users and
begins to implement a level of security and simplifies the view for these users
Examples:
Students should not see faculty salaries.
Faculty should not see billing or payment data.
Information that can be derived from stored data might be viewed as if it were stored.
GPA not stored, calculated when needed.
Applications are written in terms of an external schema. The external view is computed when
accessed. It is not stored. Different external schemas can be provided to different categories of
users. Translation from external level to conceptual level is done automatically by DBMS at run
time. The conceptual schema can be changed without changing application:
Mapping from external to conceptual must be changed.
Referred to as conceptual data independence.
Data Independence
Logical data independence
Immunity of external models to changes in the logical model
Occurs at user interface level
Physical data independence
Immunity of logical model to changes in internal model
Occurs at logical interface level
Database Models
A database model is a theory or specification describing how a database is
structured and used. Several such models have been suggested.
The common models include
Network Model - Any links supporting quick access.
Hierarchical Model - Links but no cycles (hierarchy).
Relational Model - Data Independence.
Object Oriented Model - Entity Abstraction.
Network Model
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than
one parent per child. So, the network model permitted the modeling of many-tomany relationships in data. In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model. The basic data modeling construct
in the network model is the set construct. A set consists of an owner record type, a
set name, and a member record type. A member record type can have that role in
more than one set, hence the multiparent concept is supported. An owner record
type can also be a member or owner in another set. The data model is a simple
network, and link and intersection record types (called junction records by IDMS)
may exist, as well as sets between them . Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record
type is owner (at the tail of the network arrow) and one or more record types are
members (at the head of the relationship arrow). Usually, a set defines a 1:M
relationship, although 1:1 is permitted. The CODASYL network model is based on
mathematical set theory.
Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy
of parent and child data segments. This structure implies that a record can have
repeating information, generally in the child data segments. Data in a series of
records, which have a set of field values attached to it. It collects all the instances of
a specific record together as a record type. These record types are the equivalent of
tables in the relational model, and with the individual records being the equivalent
of rows. To create links between these record types, the hierarchical model uses
Parent Child Relationships. These are a 1:N mapping between record types. This is
done by using trees, like set theory used in the relational model, "borrowed" from
maths. For example, an organization might store information about an employee,
such as name, employee number, department, salary. The organization might also
store information about an employee's children, such as name and date of birth.
The employee and children data forms a hierarchy, where the employee data
represents the parent segment and the children data represents the child segment.
If an employee has three children, then there would be three child segments
associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent
segment. Hierarchical DBMSs were popular from the late 1960s, with the
introduction of IBM's Information Management System (IMS) DBMS, through the
1970s.
Relational Model
(RDBMS - relational database management system) A database based on the
relational model developed by E.F. Codd. A relational database allows the definition
of data structures, storage and retrieval operations and integrity constraints. In such
a database the data and relations between them are organised in tables. A table is a
collection of records and each record in a table contains the same fields.
Properties of Relational Tables:
# Values Are Atomic
# Each Row is Unique
# Column Values Are of the Same Kind
# The Sequence of Columns is Insignificant
# The Sequence of Rows is Insignificant
# Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific
values of that field will use indexing to speed them up. Where fields in two different
tables take values from the same set, a join operation can be performed to select
related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders"
table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would
sum the prices of all products ordered by that customer by joining on the productcode fields of the two tables. This can be extended to joining multiple tables on
multiple fields. Because these relationships are only specified at retreival time,
relational databases are classed as dynamic database management system. The
RELATIONAL database model is based on the Relational Algebra.
Object-Oriented Model
Uses the E-R modeling as a basis but extended to include encapsulation, inheritance
Objects have both state and behavior
State is defined by attributes
Behavior is defined by methods (functions or procedures)
Designer defines classes with attributes, methods, and relationships

Class constructor method creates object instances
Each object has a unique object ID
Classes related by class hierarchies
Database objects have persistence
Both conceptual-level and logical-level model
The Entity-Relationship Model
Database Design
Goal of design is to generate a formal specification of the database schema
Methodology:
1. Use E-R model to get a high-level graphical view of essential components of
enterprise and how they are related
2. Then convert E-R diagram to SQL DDL, or whatever database model you are
using
E-R Model is not SQL based. It's not limited to any particular DBMS. It is a conceptual and
semantic model captures meanings rather than an actual implementation
The E-R Model: The enterprise is viewed as set of
Entities
Relationships among entities
Symbols used in E-R Diagram
Entity rectangle
Attribute oval
Relationship diamond
Link line
Entities and
Attributes
Entity:
enterprise
other objects. (not shown in the ER diagram--is an instance)
an object that is involved in the

and that be distinguished from
Can be person, place, event, object, concept in the real world
Can be physical object or abstraction
Ex: "John", "CSE305"
Entity Type: set of similar objects or a category of entities; they are well defined
A rectangle represents an entity set
Ex: students, courses
We often just say "entity" and mean "entity type"
Attribute: describes one aspect of an entity type; usually [and best when] single valued and
indivisible (atomic)
Represented by oval on E-R diagram
Ex: name, maximum enrollment
May be multi-valued use double oval on E-R diagram
May be composite attribute has further structure; also use oval for
composite attribute, with ovals for components connected to it by lines
May be derived a virtual attribute, one that is computable from existing

data in the database, use dashed oval. This helps reduce redundancy
Entity Types
An entity type is named and is described by set of attributes
Student: Id, Name, Address, Hobbies
Domain: possible values of an attribute.
Note that the value for an attribute can be a set or list of values, sometimes
called "multi-valued" attributes
This is in contrast to the pure relational model which requires atomic values
E.g., (111111, John, 123 Main St, (stamps, coins))
Key: subset of attributes that uniquely identifies an entity (candidate key)

Entity Schema:
The meta-information of entity type name, attributes (and associated domain), key constraints
Entity Types tend to correspond to nouns; attributes are also nouns albeit descriptions of the
parts of entities
May have null values for some entity attribute instances no mapping to domain for those
instances
Keys
Superkey: an attribute or set of attributes that uniquely identifies an entity--there can be many of
these
Composite key: a key requiring more than one attribute
Candidate key: a superkey such that no proper subset of its attributes is also a superkey
(minimal superkey has no unnecessary attributes)
Primary key: the candidate key chosen to be used for identifying entities and accessing records.
Unless otherwise noted "key" means "primary key"
Alternate key: a candidate key not used for primary key
Secondary key: attribute or set of attributes commonly used for accessing records, but not
necessarily unique
Foreign key: term used in relational databases (but not in the E-R model) for an attribute that
is the primary key of another table and is used to establish a relationship with that table where it
appears as an attribute also.
So a foreign key value occurs in the table and again in the other table. This conflicts with the
idea that a value is stored only once; the idea that a fact is stored once is not undermined
Rectangle -- Entity
Ellipses -- Attribute (underlined attributes are [part of] the primary key)
Double ellipses -- multi-valued attribute
Dashed ellipses-- derived attribute, e.g. age is derivable from birthdate and current date.
[Drawing notes: keep all attributes above the entity. Lines have no arrows. Use straight lines
only]
Graphical Representation in E-R diagram
Relationships
Relationship: connects two or more entities into an association/relationship
"John" majors in "Computer Science"
Relationship Type: set of similar relationships
Student (entity type) is related to Department (entity type) by MajorsIn

(relationship type).
Relationship Types may also have attributes in the E-R model. When they are mapped to the
relational model, the attributes become part of the relation. Represented by a diamond on E-R
diagram.
Relationship types can have descriptive attributes like entity sets
Relationships tend to be verbs or verb phrases; attributes of relationships are again nouns
ttributes and Roles

An attribute of a relationship type adds additional information to the relationship
e.g., "John" majors in "CS" since 2000
John and CS are related
2000 describes the relationship - it's the value of the since attribute of
MajorsIn relationship type
The role of a relationship type names one of the related entities. The name of the entity is usually
the role name.
e.g., "John" is value of Student role, "CS" value of Department role of MajorsIn
relationship type
(John, CS, 2000) describes a relationship
Problem: relationships can relate elements of same entity type
e.g., ReportsTo relationship type relates two elements of Employee entity type:
Bob reports to Mary since 2000
We do not have distinct names for the roles. It is not clear who reports to whom.
Solution: the role name of relationship type need not be same as name of entity type from which
participants are drawn
ReportsTo has roles Subordinate and Supervisor and attribute Since
Values of Subordinate and Supervisor both drawn from entity type Employee
Optional to name role of each entity-relationship, but helpful in cases of
Recursive relationship entity set relates to itself
Multiple relationships between same entity sets
Roles are edges labeled with role names (omitted if role name = name of entity set). Most
attributes have been omitted.
Degree of relationship
The number of roles in the relationship
Binary links two entity sets; set of ordered

pairs (most common)
Ternary links three entity sets; ordered
triples (rare). If a relationship exists among the
three entities, all three must be present
N-ary links n entity sets; ordered n-tuples
(very rare). If a relationship exists among the
entities, then all must be present. Cannot
represesnt subsets.
Note: ternary relationships may sometimes be replaced by two binary relationships. Semantic
equivalence between ternary relationships and two binary ones are not necessarily true.
Cardinality of Relationships
Cardinality is the number of entity instances to which another entity set can map under the
relationship. This does not reflect a requirement that an entity has to participate in a relationship.
Participation is another concept.
One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y, and
each entity in Y is associated with at most one entity in X.
One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in Y, but
each entity in Y is associated with at most one entity in X.
Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y, and
each entity in Y is associated with many entities in X ("many" =>one or more and sometimes
zero)
Relationship
Participation
Constraints
Total participation
Every member of entity

set must participate in
the relationship
Represented by double
line from entity rectangle to relationship diamond
E.g., A Class entity cannot exist unless related to a Faculty member entity in
this example, not necessarily at Juniata.
In a relational model we will use the references clause.
Key constraint
If every entity participates in exactly one relationship, both a total

participation and a key constraint hold
E.g., if a class is taught by only one faculty member.
Partial participation
Not every entity instance must participate
Represented by single line from entity rectangle to relationship diamond
E.g., A Textbook entity can exist without being related to a Class or vice
versa.
Existence Dependency and Weak

Entities
Existence dependency: Entity Y is existence
dependent on entity X is each instance of Y must
have a corresponding instance of X
In that case, Y must have total participation in its
relationship with X
If Y does not have its own candidate key, Y is called
a weak entity, and X is strong entity
Weak entity may have a partial key, called a discriminator, that distinguishes instances of the
weak entity that are related to the same strong entity
Use double rectangle for weak entity, with double diamond for relationship connecting it to its
associated strong entity
Note: not all existence dependent entities are weak the lack of a key is essential to definition
Schema of a Relationship Type

Contains the following features:
Role names, Ri, and their corresponding entity sets. Roles must be single valued (the number of
roles is called its degree)
Attribute names, Aj, and their corresponding domains. Attributes in the E-R model may be set or
multi-valued.
Key: Minimum set of roles and attributes that uniquely identify a relationship
Relationship: <e1, en; a1, ak>
ei is an entity, a value from Ris entity set
aj is a set of attribute values with elements from domain of A j
Example ER diagram
Mapping the ER Model to Relational DBs
Database Design
Goal of design is to generate a formal specification of the database schema
Methodology:
1. Use E-R model to get a high-level graphical view of essential components of
enterprise and how they are related
2. Then convert E-R diagram to SQL Data Definition Language (DDL), or
whatever database model you are using
E-R Model is not SQL based.

The E-R Model: The database represented is viewed as a graphical drawing of
Entities and attributes
Relationships among those entities
--not tables!
Relational Model: The database is viewed as a
Tables
and their attributes (keys)
--we could include constraints but will not at this stage.
Representation of Entity Type in Relational Model

Mapping #1: Each entity type always corresponds to a relation
---> Person(....)
Mapping #2: The attributes of a relation contains at least the simple attributes of an entity
type
Attributes are single valued
There may be additional attributes (foreign keys)
Persons(SSN, FirstName, LastName, Address, Birthdate)

Problem: Recall that the entity type can have multi-valued attributes.
Possible solution: Use several rows to represent a single entity
(111111, John, 123 Main St, stamps)
(111111, John, 123 Main St, coins)
Problems with this solution:
Redundancy of the other attributes (never good)
Key of entity type no longer can be key of relation
so, the resulting relation must be further transformed--> Normalization is the process we will
study to help deal with this and would result in:
Persons(SSN, FirstName, LastName, Address, Birthdate)
Hobbies(SSN, Hobby)
Relationship mapping
Relationship: connects two or more entities into an association/relationship
John majors in Computer Science
Relationship Type: set of similar relationships
Student (entity type) related to Department (entity type) by MajorsIn

(relationship type).
Distinction
relation (relational model) - set of tuples
relationship (E-R Model) describes relationship between entities of an

enterprise
Entity types and most relationship types in the E-R model are mapped to relations (relational
model)
Mapping #3: 1-1 and 1-many relationships between separate

entitites need not be mapped to a relation; the primary key
attributes of the "1" relation become foreign key attributes of the
"many" relation
If no "Since" attribute, the relations could be (with some appropriate attribute renaming and
additions)
Students(StudId, Name, Dept)
Departments(Dept, Chair)
Relationship Types may also have attributes in the E-R model.
Mapping #4: Any attributes of the 1-1 or 1-many relationship may be

attached to the "many" relation.
Students(StudId, Name, Dept, Since)

Departments(Dept, Chair)
Mapping #5: many-many relationships are always mapped to a

separate relation
Textbooks(ISBN, Title, Author, Copyright, Edition, Price)

Class(ClassNo, Name, Room, Days, Time)
TextUses(ISBN, ClassNo)
Mapping #6: The attributes of many-many relationships become part

of the relationship type relation, as well as the primary key
attributes of the related entity types
TextUses(ISBN, ClassNo, Optional)
Projects(ProjId, Name, TotalCost, StartDate)

Parts(UPC, PartName, Weight, WSPrice)
Suppliers(SupId, Name, Address)
Sold(ProjId, UPC, SupId, Date, Price)
Relationships tend to be verbs; attributes of relationships are nouns or adverbs
Roles
Problem: recursive relationships can relate elements of same entity type
e.g., the ReportsTo relationship type relates two elements of the Employee entity type:
Bob reports to Mary since 2000
We do not always have distinct names for the roles

It is not clear who reports to whom
Solution: the role name of relationship type need not be same as name of entity type from which
participants are drawn
ReportsTo has roles Subordinate and Supervisor and attribute Since
Values of Subordinate and Supervisor both drawn from entity type

Employee
Mapping #7: If the cardinality is 1-many or 1-1 of a recursive relationship, then a second
attribute of the same domain as the key may be added to the entity relation to establish the
relationship. Attributes of the relationship can also be added to the entity relation, but may
be a good reason to create a separate relation with the attributes and keys of the entities.
Employees(EmpID, Name, Address, Salary, SupervisorID)

Persons(PID, Name, Address, SpouseID, Mdate)
Mapping #8: for many-many recursive relationships, you create a relation including the
attributes of the relation but with the primary keys of the entity included twice, one for
each role.
Assume multiple marriages are now recorded, thus many-to-many
MarriedTo(HusbandID, WifeID, MarDate, DivDate)
Examples
S2000Courses (CrsCode, SectNo, Enroll)

Professor (Id, DeptId, Name)
Teaching (CrsCode, SecNo, Id, RoomNo)
Real SQL code

CREATE TABLE WorksIn (
Since DATE,
-- attribute
Status CHAR (10), -- attribute
ProfId INTEGER,
-- role (key of Professor)
DeptId CHAR (4), -- role (key of Department)
PRIMARY KEY (ProfId), -- since a professor works in at most one department
FOREIGN KEY (ProfId) REFERENCES Professor (Id),
FOREIGN KEY (DeptId) REFERENCES Department
CREATE TABLE Sold (

Price INTEGER,
-- attribute
Date DATE,
-- attribute
ProjId INTEGER,
-- role
SupplierId INTEGER, -- role
PartNumber INTEGER, -- role
PRIMARY KEY (ProjId, SupplierId, PartNumber, Date),
FOREIGN KEY (ProjId) REFERENCES Project (Id),
FOREIGN KEY (SupplierId) REFERENCES Supplier (Id),
FOREIGN KEY (PartNumber) REFERENCES Part (Number)
)
The Relational Data Model

History of Relational Model
1970 paper by E.F. Codd A Relational Model of Data for Large Shared Data
Banks proposed relational model
System R, prototype developed at IBM Research Lab at San Jose, California

late 1970s
Peterlee Test Vehicle, IBM UK Scientific Lab
INGRES, University of California at Berkeley, in Unix
System R results used in developing DB2 from IBM and also Oracle
Early microcomputer based DBMSs were relational - dBase, R;base, Paradox
Microsofts Access, now most popular microcomputer-based DBMS, is relational

Oracle, DB2, Informix, Sybase, Microsofts SQL Server, MySQL, PostgreSQL- most popular
enterprise DBMSs, all relational
Advantages of Relational Model
Based on mathematical notion of relation

o
Can use power of mathematical abstraction
Can develop body of results using theorem and proof method of

mathematics results then apply to many different applications
Can use expressive, exact mathematical notation
Theory provides tools for improving design
Basic structure is simple, easy to understand
Separates logical from physical level
Data operations easy to express, using a few powerful commands
Operations do not require user to know storage structures used
Data Structures
Relations are represented abstractly as tables
Tables are related to one another
Table holds information about objects or entities
Rows (tuples) correspond to individual entities
Each tuple is distinct no duplicate tuples
Order of tuples is immaterial
Cardinality of relation = number of tuples
Columns correspond to attributes
Each column has a distinct name, the name of the attribute it represents
Order of attributes not important
Each cell contains at most one value
A column contains values from one domain
Domains consist of atomic values
Arity = number of attributes, sometimes called the degree of the relation
Example: Relations
Student table tells facts about students
Faculty table shows facts about faculty
Class table shows facts about classes, including what faculty member
teaches each
Enroll table relates students to classes
Student
stuId
lastName
firstName
major
credits
S1001
Smith
Tom
History
90
S1002
Chin
Ann
Math
36
S1005
Lee
Perry
History
S1010
Burns
Edward
Art
S1013
McCarthy
Owen
Math
S1015
Jones
Mary
Math
42
S1020
Rivera
Jane
CSC
15
63
Class
classNumber
facId
schedule
room
ART103A
F101
MWF9
H221
CSC201A
F105
TuThF10
M110
CSC203A
F105
MThF12
M110
HST205A
F115
MWF11
H221
MTH101B
F110
MTuTh9
H225
MTH103C
F110
MWF11
H225
Faculty
facId
name
department
rank
F101
Adams
Art
Professor
F105
Tanaka
CSC
Instructor
F110
Byrne
Math
Assistant
F115
Smith
History
Associate
F221
Smith
CSC
Professor
Enroll
stuId
classNumber
grade
S1001
ART103A
S1001
HST205A
S1002
ART103A
S1002
CSC201A
S1002
MTH103C
S1010
ART103A
S1010
MTH103C
S1020
CSC201A
S1020
MTH101B
Mathematical Relations
For two sets D1 and D2, the Cartesian product, D1 X D2 , is the set of all ordered pairs in which
the first element is from D1 and the second is from D2. The domains for the two sets are abitrary.
A relation ,then, is any subset of the Cartesian product
One can form a Cartesian product of 3 sets; a relation is any subset of the ordered triples so
formed.
This can extend to n sets, using n-tuples
Database Relations
A relation schema, named R, is a set of attributes A1, A2,,An with their corresponding domains
D1, D2,Dn
A relation r on relation schema R is a set of mappings from the attributes to their domains,
or to say r is a set of n-tuples (A1:d1, A2:d2, , An:dn) such that d1 D1, d2D2 , , dnDn
In a table to represent the relation, list the Ai's as column headings, and let the (d1, d2, dn)
become the n-tuples, the rows of the table
Relation Schema
A schema defines the following
Relation name
Attribute names and domains
Integrity constraints
e.g.,:
The values of a particular attribute in all tuples are unique
The values of a particular attribute in all tuples are greater than 0
Default values
Relational Database
Finite set of relations
Each relation consists of a schema definition and an instance of the relation
Database schema = set of relation schemas (and other things)
Database instance = set of (corresponding) relation instances
Example
Student (StuId: INT, LastName: STRING, FirstName: STRING, major: STRING,

credits DEC)
Faculty (FacId: STRING, Name: STRING, Dept: DEPTS, Rank RANKS)
Class (FacId: STRING, Schedule: STRING, Room: STRING, ClassNum:

COURSES)
Enroll (ClassNum: COURSES, StudId: DEC, Grade: GRADES)
Department(DeptId: DEPTS, Name: STRING)
TableName (attr1:type, attr2:type, ... ) is a simplified non-SQL description of the table.
Relation Keys
Relations never have duplicate tuples, so you can always tell tuples apart; implies there is always
a key (which may be a composite of all attributes, in worst case)
Superkey: set of attributes that uniquely identifies tuples
Candidate key: superkey such that no proper subset of itself is also a superkey (i.e. it has no
unnecessary attributes)
Primary key: candidate key chosen for unique identification of tuples

Cannot verify a key by looking at an instance; need to consider semantic information to ensure
uniqueness
A foreign key is an attribute or combination of attributes that is the primary key of some relation
(called its home relation). Usually the home relation is some other relation but there can be cases
of self-referencing (recursuve relationship)
Key Constraint
Values in a column (or columns) of a relation are unique: at most one row in a relation instance
can contain a particular value(s)
Key - set of attributes satisfying key constraint
e.g., Id in Student,
e.g., (StudId, CrsCode, Semester) in Transcript
Minimality - no subset of a key is a key. When you determine a key, this rule should be applied.
(StudId, CrsCode) is not a key of Transcript
Superkey - set of attributes containing key
(Id, Name) is a superkey of Student, but as a key, it's not minimal
Every relation has a key. The goal is to determine the "best" key, but a relation can have several
keys:
primary key (Id in Student) (cannot be null) -- only one is designated per
relation
candidate key ((Name, Address) in Student) is a potential key and

sometimes used as information to the DBMS to set up an index for efficient
lookup.
Foreign Key Constraint

Also known as Referential integrity => Item named in one relation must correspond to tuple(s)
in another that describes the item
Examples:
Transcript (CrsCode) references Course(CrsCode )
Professor(DeptId) references Department(DeptId)
We say "a1 is a foreign key of R1 referring to a2 in R2" meaining that "if v is the value of a1, then
there is a unique tuple in R2 in which a2 has the same value v
This is a special case of referential integrity: a 2 must be a candidate key of R2

(CrsCode is a key of Course), e.g., not necessarily the primary key (often is,
however)
If no row exists in R2 then we have a violation of referential integrity
Not all rows of R2 need to be referenced.: relationship is not symmetric (some

course might not be taught)
Value of a foreign key might not be specified (DeptId column of some

professor might be null)
Example
Note the foreign key might consist of several columns:
(CrsCode, Semester) of Transcript references (CrsCode, Sem) of Teaching
In general, when R1(a1, an) references R2(b1, bn):
There exists a 1 - 1 relationship between a 1,an and b1,bn
ai and bi must have the same base domains (although not necessarily the
same names)
b1,bn is a candidate key of R2
Types of Integrity
Data Integrity
Data Integrity validates the data before getting stored in the columns of the table.
SQL Server supports four type of data integrity:
Entity Integrity
Entity Integrity can be enforced through indexes, UNIQUE constraints and PRIMARY KEY
constraints.
Domain Integrity
Domain integrity validates data for a column of the table.

It can be enforced using:
Foreign key constraints,
Check constraints,
Default definitions
NOT NULL.
Referential Integrity
FOREIGN KEY and CHECK constraints are used to enforce Referential Integrity.
User-Defined Integrity
It enables you to create business logic which is not possible to develop using system constraints.
You can use stored procedure, trigger and functions to create user-defined integrity.
EF Codd Rules
A relational database management system (RDBMS) is a database management system
(DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular
commercial and open source databases currently in use are based on the relational model.
A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables
and the relationship among the data is also stored in the form of tables.
E.F. Codd, the famous mathematician has introduced 12 rules for the relational model for
databases commonly known as Codd's rules. The rules mainly define what is required for a
DBMS for it to be considered relational, i.e., an RDBMS. There is also one more rule i.e. Rule00
which specifies the relational model should use the relational way to manage the database. The
rules and their description are as follows:Rule 0: Foundation Rule
A relational database management system should be capable of using its relational facilities
(exclusively) to manage the database.
Rule 1: Information Rule
All information in the database is to be represented in one and only one way. This is achieved by
values in column positions within rows of tables.
Rule 2: Guaranteed Access Rule
All data must be accessible with no ambiguity, that is, Each and every datum (atomic value) is
guaranteed to be logically accessible by resorting to a combination of table name, primary key
value and column name.
Rule 3: Systematic treatment of null values
Null values (distinct from empty character string or a string of blank characters and distinct from
zero or any other number) are supported in the fully relational DBMS for representing missing
information in a systematic way, independent of data type.
Rule 4: Dynamic On-line Catalog Based on the Relational Model

The database description is represented at the logical level in the same way as ordinary data, so
authorized users can apply the same relational language to its interrogation as they apply to
regular data. The authorized users can access the database structure by using common language
i.e. SQL.
Rule 5: Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of terminal use. However,
there must be at least one language whose statements are expressible, per some well-defined
syntax, as character strings and whose ability to support all of the following is comprehensible:
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. Transaction boundaries (begin, commit, and rollback).
Rule 6: View Updating Rule

All views that are theoretically updateable are also updateable by the system.
Rule 7: High-level Insert, Update, and Delete
The system is able to insert, update and delete operations fully. It can also perform the operations
on multiple rows simultaneously.
Rule 8: Physical Data Independence
Application programs and terminal activities remain logically unimpaired whenever any changes
are made in either storage representation or access methods.
Rule 9: Logical Data Independence
Application programs and terminal activities remain logically unimpaired when information
preserving changes of any kind that theoretically permit unimpairment are made to the base
tables.
Rule 10: Integrity Independence

Integrity constraints specific to a particular relational database must be definable in the relational
data sublanguage and storable in the catalog, not in the application programs.
Rule 11: Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application programs and
terminal activities to remain logically unimpaired whether and whenever data are physically
centralized or distributed.
Rule 12: Nonsubversion Rule
If a relational system has or supports a low-level (single-record-at-a-time) language, that lowlevel language cannot be used to subvert or bypass the integrity rules or constraints expressed in
the higher-level (multiple-records-at-a-time) relational language.
On the basis of the above rules there is no fully relational DBMS available today
Functional Dependencies
Objectives of Normalization
Develop a good description of the data, its relationships and constraints
Produce a stable set of relations that
Is a faithful model of the enterprise
Is highly flexible
Reduces redundancy-saves space and reduces inconsistency in data
Is free of update, insertion and deletion anomalies
Normal Forms
First normal form -1NF
Second normal form-2NF
Third normal form-3NF
Boyce-Codd normal form-BCNF
Fourth normal form-4NF
Fifth normal form-5NF
Domain/Key normal form-DKNF
Each is contained within the previous form each has stricter rules than the previous form
Limitations of E-R Designs

E-R modeling provides a set of guidelines, but does not result in a unique database schema.
Nor does it provide a way of evaluating alternative schemas.
Normalization theory provides a mechanism for analyzing and refining the schema produced by
an E-R design, or any other design.
Redundancy
Dependencies between attributes within a relation cause redundancy
Ex. All addresses in the same town have the same zip code
SSN
1234
Name
Joe
Town
Zip
Huntingdon 16652
2345
Mary
Huntingdon 16652
3456
Tom
Huntingdon 16652
5948
Harry
Alexandria 16603
There's clearly redundant information stored here.

Consistency and integrity are harder to maintain even in this simple example, e.g., ensuring the
fact that the zip code always refers the same city and the city is spelled consistently.
Note we don't have a zip code to city fact stored unless there is a person from that zipcode
Redundancy and Other Problems

Set-valued or multi-valued attributes in the E-R diagram result in multiple rows in corresponding
table
Example: Person (SSN, Name, Address, Hobbies)
A person entity with multiple hobbies yields multiple rows in table Person
Hence, the association between Name and Address for the same person is
stored redundantly
SSN is key of entity set, but (SSN, Hobbies) is key of corresponding relation
below
The relation Person cant describe people without hobbies

but more important is the replication of what would be the key value
SSN
1111
1111
2222
Anomalies
An anomaly is an inconsistent, incomplete, or contradictory state of the database
Insertion anomaly user is unable to insert a new record of data when it

should be possible to do so because not all other information is available.
Deletion anomaly when a record is deleted, other information that is tied

to it is also deleted
Update anomaly a record is updated, but other appearances of the same

items are not updated
Redundancy leads to the following anomalies:

Update anomaly: A change in Address must be made in several places. Updating one fact may
require updating multiple tuples.
Deletion anomaly: Deleting one fact may delete other information. Suppose a person gives up
all hobbies. Do we:
Set Hobby attribute to null? No, since Hobby is part of key
Delete the entire row? No, since we lose other information in the row
Insertion anomaly: To record one fact may require more information than is available. Hobby
value must be supplied for any inserted row since Hobby is part of key
Decomposition
Solution: use two relations to store Person information
Person1 (SSN, Name, Address)
Hobbies (SSN, Hobby)
The decomposition is more general: people with hobbies can now be described
No update anomalies:
Name and address stored once

A hobby can be separately supplied or deleted
Decomposition is the process of breaking a relation into two or more relations to eliminate the
redundancies and corresponding anomalies.
Normalization Theory
The result of E-R analysis needs further refinement.
Appropriate decomposition can solve problems. What is appropriate?
The underlying theory is referred to as normalization theory and is based on functional
dependencies (and other kinds, like multivalued dependencies)
Informal Guidelines for Relation Design

Want to keep the semantics of the relation attributes clear. The information in a tuple should
represent exactly one fact or an entity. The hidden or buried entities are what we want to
discover and eliminate.
Design a relation schema so that it is easy to explain its meaning.
Do not combine attributes from multiple entity types and relationship types
into a single relation. Use a view if you want to present a simpler layout to
the end user.
A relation schema should correspond to on entity type or relationship type.
Minimize redundant information in tuples, thus reducing update anomalies
If anomalies are present, try to decompose the relation into two or more to
represent the separate facts, or document the anomalies well for
management in the applications programs.
Minimize the use of null values. Nulls have multiple interpretations:
The attribute does not apply to this tuple
The attribute value is unknown
The attribute value is absent
The attribute value might represent an actual value
If nulls are likely (non-applicable) then consider decomposition of the relation into two or more
relations that hold only the non-null valued tuples.
Do not permit the creation of spurious tuples
Too much decomposition of relations into smaller ones may also lose information or generate
erroneous information
Be sure that relations can be logically joined using natural join and the result
doesn't generate relationships that don't exist
Functional Dependencies
FD's are constraints on well-formed relations and represent a formalism on the
infrastructure of relation.
Definition: A functional dependency (FD) on a relation schema R is a constraint X Y, where

X and Y are subsets of attributes of R.
Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other
attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely
defined.
X is a determinant
X determines Y
Y is functionally dependent on X
XY
X Y is trivial if Y X
Definition: An FD X Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t

and s agree on all attributes in X then they must agree on all attributes in Y
A key constraint is a special kind of functional dependency: all attributes of relation occur on the
right-hand side of the FD:
SSN SSN, Name, Address
Example Functional Dependencies

Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)
FDs in R include
{stuId}{lastName}, but not the reverse
{stuId} {lastName, major, credits, status, socSecNo, stuId}
{socSecNo} {stuId, lastName, major, credits, status, socSecNo}
{credits}{status}, but not {status}{credits}
ZipCodeAddressCity
16652 is Huntingdons ZIP
ArtistNameBirthYear
Picasso was born in 1881
AutobrandManufacturer, Engine type
Pontiac is built by General Motors with gasoline engine
Author, TitlePublDate
Shakespeares Hamlet was published in 1600
Trivial Functional Dependency
The FD XY is trivial if set {Y} is a subset of set {X}

Examples: If A and B are attributes of R,
{A}{A}
{A,B} {A}
{A,B} {B}
{A,B} {A,B}
are all trivial FDs and will not contribute to the evaluation of normalization.
FD Axioms
Understanding: Functional Dependencies are recognized by analysis of the real world; no
automation or algorithm. Finding or recognizing them are the database designer's task.
FD manipulations:
Soundness -- no incorrect FD's are generated
Completeness -- all FD's can be generated
Axiom Name
Axiom
Example
Reflexivity
if a is set of attributes, b a,
then a b
SSN,Name SSN
Augmentation
if a b holds and c is a set of

attributes, then cacb
SSN Name then

SSN,Phone Name, Phone
Transitivity
if a b holds and bc holds,

then a c holds
SSN Zip and Zip City then SSN

City
Union or
Additivity *
if a b and a c holds then a SSNName and SSNZip then

bc holds
SSNName,Zip
Decomposition if a bc holds then a b and a

or Projectivity* c holds
SSNName,Zip then SSNName and

SSNZip
Address Project and Project,Date

Pseudotransitivi if a b and cb d hold then ac
Amount then Address,Date
ty*
d holds
Amount
(NOTE)
ab c does NOT imply a b and

bc
*Armstrong's Axioms (basic axioms)
Closure
Find all FD's for attributes a in a relation R
a+ denotes the set of attributes that are functionally determined by a
IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a+ SHOULD BE THE WHOLE
RELATION R. This is our goal. Any attributes in a relation not part of the closure
indicates a problem with the design.
Algorithm for Closure
result := a; //start with superkey a

WHILE (more changes to result) DO
FOREACH ( FD b c in R) DO
IF b result
THEN result := result c
Normalization
Normalization
Process to revise relational schemas to hold desirable properties
1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Properties of bad design
Repetition of information
Inability to represent certain information
Loss of information
Decomposition
Replace an "unnormalized" relation by a set of normalized relations
If R is a relation scheme then
{R1, R2, ..., Rn} is a decomposition
if R = R1 R2 ... Rn
Desirable properties of decomposition
Lossless join decomposition

Dependency preservation
all FDs are represented in the resulting relations
Minimum repetition of information
Lossless Join Decomposition

If r is a relation on scheme R
and ri is a relation on Ri then
r is a subset of the natural join of the ri's
A lossless join decomposition is one that the ri's when joined produce r.
I.e. no spurious tuples can be generated, nor any are lost.
Consider the following relation
enroll (stId, crsNo, dateEnrolled, roomNo, instructor)

Suppose we decompose the above relation into two relations enrol11 and enrol12 as follows
enroll1 (stId, csNo, dateEnrolled)
enroll2 (dateEnrolled, roomNo, instructor)
There are many problems with this decomposition but we focus on one aspect at the moment. Let
an instance of the relation enrol be
stId
crsNo
dateEnrolled
roomNo
instructor
830057
830057
820159
825678
826789
CP302
CP303
CP302
CP304
CP305
1FEB2004
1FEB2004
10JAN2004
1FEB2004
15JAN2004
MP006
MP006
MP006
CE122
EA123
Gupta
Jones
Gupta
Wilson
Smith
and let the decomposed relations enroll1 and enroll2 be:

stId
crsNo
dateEnrolled
830057
830057
820159
825678
826789
CP302
CP303
CP302
CP304
CP305
1FEB2004
1FEB2004
10JAN2004
1FEB2004
15JAN2004
dateEnrolled
roomNo
instructor
1FEB2004
1FEB2004
10JAN2004
1FEB2004
15JAN2004
MP006
MP006
MP006
CE122
EA123
Gupta
Jones
Gupta
Wilson
Smith
All the information that was in the relation enroll appears to be still available in enroll1 and
enroll2 but this is not so. Suppose, we wanted to retrieve the student numbers of all students
taking a course from Wilson, we would need to join enroll1 and enroll2. The join would have 11
tuples as follows:
stId
crsNo
dateEnrolled
roomNo
instructor
830057
830057
830057
830057
830057
830057
CP302
CP302
CP303
CP303
CP302
CP303
1FEB2004
1FEB2004
1FEB2004
1FEB2004
1FEB2004
1FEB2004
MP006
MP006
MP006
MP006
CE122
CE122
Gupta
Jones
Gupta
Jones
Wilson
Wilson
(add further tuples ...)

The join contains tuples that were not in the original!
Null Values in Tuples

Relations should be designed such that their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in separate relations (with the
primary key)
Reasons for nulls:
attribute not applicable or invalid
attribute value unknown (may exist)
value known to exist, but unavailable
Spurious Tuples
Bad designs for a relational database may result in erroneous results for certain JOIN
operations
The "lossless join" property is used to guarantee meaningful results for join operations
The relations should be designed to satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any relations.
There are two important properties of decompositions:
(a) non-additive or losslessness of the corresponding join
(b) preservation of the functional dependencies.

Note that property (a) is extremely important and cannot be sacrificed. Property (b) is less
stringent and may be sacrificed.
First Normal Form (1NF) : Disallows composite attributes,

multivalued attributes, and nested relations; attributes
whose values for an individual tuple are non-atomic
flat file
no repeating fields, sets or lists--atomic or single-valued values
no missing values
by definition of relations, all relations are 1NF
Here we can see that there is a non atomic value {Bellaire, Sugarland,
Houston}
Here we can see that there is a non atomic value {Bellaire, Sugarland, Houston}
Here we can see that there is a non atomic value {Bellaire, Sugarland, Houston}. As
in fig. c they can be written as separate tuples
(or)
the schema can be converted into 2 relations as below
{Dnumber, Dname, Dmgrssn}
{Dnumber,Dlocation}
(Or)
can be converted into into one relation with 3 columns for each location value. But
the below table will result in null values for other depts
{Dnumber, Dname, Dmgrssn,Dlocation1,Dlocation2,Dlocation3}
The 2nd option is termed as best because it reduces redundancy and does not
introduce null values
Counter-Example for 1NF
See Figure 5.4(a) NewStu Table (Assume students can have double majors)
Stuid
lastName
major
credits
status
socSecNo
S1001
Smith
History
90
Senior
100429500
S1003
Jones
Math
95
Senior
010124567
S1006
Lee
CSC
15
Freshman
088520876
Math
S1010
Burns
Art
English
63
Junior
099320985
S1060
Jones
CSC
25
Freshman
064624738
NewStu(StuId, lastName, major, credits, status, socSecNo) Assume students can have more
than one major
1NF Decomposition
The major attribute is not single-valued for each tuple
Ensuring 1NF
Best solution: For each multi-valued attribute, create a new table, in which you place the key of
the original table and the multi-valued attribute. Keep the original table, with its key
NewStu2(stuId, lastName, credits,status, socSecNo)
Majors(stuId, major)
stuId
lastName
credits
status
socSecNo
S1001
Smith
90
Senior
100429500
S1003
Jones
95
Senior
010124567
S1006
Lee
15
Freshman
088520876
S1010
Burns
Junior
099320985
S1060
Jones
25
Freshman
064624738
stuId
major
S1001
History
S1003
Math
S1006
CSC
S1006
Math
S1010
Art
S1010
English
S1060
CSC
Another method for 1NF

If the number of repeats is limited, make additional columns for multiple values
Student(stuId, lastName, major1, major2, credits, status, socSecNo)
stuId
lastName
major1
S1001
Smith
S1003
major2
credits
status
socSecNo
History
90
Senior
100429500
Jones
Math
95
Senior
010124567
S1006
Lee
CSC
Math
15
Freshman
088520876
S1010
Burns
Art
English
63
Junior
099320985
S1060
Jones
CSC
25
Freshman
064624738
What is Full Functional Dependency

In relation R, a set of attributes B is fully functionally dependent on a set of attributes A if B is
functionally dependent on A but not functionally dependent on any proper subset of A
This means every attribute in A is needed to functionally determine B
Partial Functional Dependency Example
NewClass(courseNo, stuId, stuLastName, facId, schedule, room, grade)
FDs:
{courseNo,stuId} {lastName}
{courseNo,stuId} {facId}
{courseNo,stuId} {schedule}
{courseNo,stuId} {room}
{courseNo,stuId} {grade}
courseNo facId //**partial FD
courseNo schedule //**partial FD
courseNo room //** partial FD
stuId lastName //** partial FD
plus trivial FDs that are partial
Second Normal Form (2NF)

A relation is in second normal form (2NF) if it is in first normal
form and all the non-key attributes are fully functionally
dependent on the key.
1NF
each non-key attribute is functionally dependent on a candidate key
if the key is composite, no DF exists between non-key and a subkey (just part
of the key)
If key has only one attribute, and R is 1NF, R is automatically 2NF

2NF is a scientific accident; has little practical or theoretical value
2NF Decomposition
Converting to 2NF
Identify each partial FD
Remove the attributes that depend on each of the determinants so identified
Place these determinants in separate relations along with their dependent

attributes
In original relation keep the composite key and any attributes that are fully
functionally dependent on all of it
Even if the composite key has no dependent attributes, keep that relation to
connect logically the others
Example
The EMP_PROJ relation in Figure 15.3(b) is in1NF but is not in 2NF. The nonprime attribute Ename
violates 2NF because of FD2, as do the nonprime attributes Pname and Plocaton because of FD3 . The
functional dependencies FD2 and FD3 make Ename , Pname , and Plocation partially dependent on the primary
key { Ssn , Pnumber } of EMP_PROJ , thus violating the 2NF test
Another 2NF Example
NewClass(courseNo, stuId, stuLastName, facId, schedule, room, grade )

FDs grouped by determinant:
{courseNo} {courseNo,facId, schedule, room}
{stuId} {stuId, lastName}
{courseNo,stuId} {courseNo, stuId, facId, schedule, room, lastName, grade}
Create tables grouped by determinants:
Course(courseNo,facId, schedule, room)
Stu(stuId, lastName)
Keep relation with original composite key, with attributes FD on it, if any
NewStu2( courseNo, stuId, grade)
What is Transitive Dependency?

If A, B, and C are attributes of relation R, such that A B, and B C, then C is transitively
dependent on A
Example:
NewStudent (stuId, lastName, major, credits, status)
FD:
creditsstatus
By transitivity:
stuIdcredits and creditsstatus implies stuIdstatus
Transitive dependencies cause update, insertion, deletion anomalies.
Third Normal Form (3NF) : A relation schema R is in third

normal form (3NF) if it is in 2NF and no non-prime attribute A
in R is transitively dependent on the primary key
Also A relation is in third normal form (3NF) if whenever a non-trivial functional dependency
XA exists, then either X is a superkey or A is a member of some candidate key
To be 3NF, relation must be 2NF and have no transitive dependencies
No non-key attribute determines another non-key attribute. Here key includes candidate key
3NF Decomposition
Remove the dependent attribute, status, from the relation
Create a new table with the dependent attribute and its determinant, credits
Keep the determinant in the original table
Example
The relation schema EMP_DEPT in Figure 15.3(a) is in 2NF, since no partial depen-dencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn (and also Dname) on
Ssn via Dnumber. We can normalize EMP_DEPT by decomposing it into 2 3NF relations
Another example
PRESIDENTS(Pres,Spouse, Party,
Founded)
pres spouse
pres party
party founded
PRESIDENTS (Pres, Spouse, Party )

PARTIES(Party, Founded)
NewStudent (stuId, lastName, major, credits,

status)
creditsstatus
NewStu2 (stuId, lastName, major, credits)

Stats (credits, status)
Boyce/Codd Normal Form (BCNF)

A relation is in Boyce/Codd Normal Form (BCNF) if whenever a non-trivial functional
dependency XA exists, then X is a superkey
Stricter than 3NF, which allows A to be part of a candidate key
If there is just one single candidate key, the forms are equivalent
3NF
for all FD's each determinant is a candidate key
3NF relations are BCNF if there is only one candidate key and the key is not
composite
Generally can reach 3NF or BCNF immediately
Example
Suppose that we have thousands of lots in the relation but the lots are from only two coun-ties: DeKalb and
Fulton. Suppose also that lot sizes in DeKalb County are only 0.5,0.6, 0.7, 0.8, 0.9, and 1.0 acres, whereas lot
sizes in Fulton County are restricted to 1.1, 1.2,...,1.9, and 2.0 acres. In such a situation we would have the
additional functional dependency FD5: Area County_name. then now Area becomes a super key of R and thus
2 new relations LOTS1AX and LOTS1AY are formed
Another Example
NewFac (facName, dept, office, rank, dateHired)

FDs:
office dept
facName,dept office, rank, dateHired
facName,office dept, rank, dateHired
NewFac is not BCNF because office is not a superkey
BCNF Decomposition Attempt

To make it BCNF, remove the dependent attributes to a new relation, with the determinant as the
key
Project into
Fac1 (office, dept)
Fac2 (facName, office, rank, dateHired)
Note we have lost a functional dependency in Fac2 no longer able to see that {facName, dept}
is a determinant, since they are in different relations
BCNF may not be dependency preserving and might have to settle for 3NF
Properties of Decompositions
Starting with a universal relation that contains all the attributes, we can decompose into relations
by projection
A decomposition of a relation R is a set of relations {R1,R2,...,Rn} such that each Ri is a subset of
R and the union of all of the Ri is R.
Desirable properties of decompositions:
Attribute preservation - every attribute is in some relation
Dependency preservation - see previous example
Lossless decomposition - discussed later
Dependency Preservation
If R is decomposed into {R1,R2,...,Rn} so that for each functional dependency XY all the
attributes in X Y appear in the same relation, Ri, then all FDs are preserved
Allows DBMS to check each FD constraint by checking just one table for each
Attribute Preservation Condition

All attributes must be preserved through the process of normalization.
Start with universal relation schema R
R = {A1,A2,...,An}, the set of attributes
D is a decomposition of R such that
D = {R1,R2,...,Rm}
and R = U Ri
Lossless Join Condition

A decomposition should not have spurious tuples generated when a natural join operation is
applied to the relations in the resulting decomposition
A decomposition (R1,,Rn) of a schema, R, is lossless if every valid instance, r, of R can be
reconstructed from its components through a natural join.
Each ri = Ri(r)
Lossless Join Decomposition Algorithm
1. set D := {R}
2. WHILE there exists a Q in D that is not in BCNF DO
Find an FD XY in Q that violates BCNF

and replace Q in D by (Q-Y) and (X Y)
MVD and Normalization Examples

Multivalued Dependencies
A multivalued dependency (MVD) X >> Y specified on relation schema
R, where X and Y are both subsets of R, specifies the following constraint on
any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X],
then two tuples t3 and t4 should also exist in r with the following properties,
where we use Z to denote (R 2 (X Y)):
t3[X] = t4[X] = t1[X] = t2[X].
t3[Y] = t1[Y] and t4[Y] = t2[Y].

t3[Z] = t2[Z] and t4[Z] = t1[Z].
An MVD X >> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X
Y = R.
Inference Rules for Functional and Multivalued Dependencies:
IR1 (reflexive rule for FDs): If X Y, then X > Y.

IR2 (augmentation rule for FDs): {X > Y} XZ > YZ.
IR3 (transitive rule for FDs): {X > Y, Y >Z}
X > Z.
IR4 (complementation rule for MVDs): {X >> Y} X >>
(R (X Y))}.
IR5 (augmentation rule for MVDs): If X >> Y and W Z
then WX >> YZ.
IR6 (transitive rule for MVDs): {X >> Y, Y >> Z} X >> (Z 2 Y).
IR7 (replication rule for FD to MVD): {X > Y} X >> Y.
IR8 (coalescence rule for FDs and MVDs): If X >> Y and there exists
W with the properties that (a) W Y is empty, (b) W > Z, and (c) Y Z,
then X > Z.
MVD Example
Course ->> Instructor
Course ->> Text
Course(Y)
Instructor(X)
Text(R-XY)
Intro
Kruse
Intro to CS
Intro
Wright
Intro to CS
CS1
Thomas
Intro to Java
CS1
Thomas
CS Theory Survey
CS2
Rhodes
Java Data Structures
CS2
Rhodes
Unix
CS2
Kruse
Java Data Structures
CS2
Kruse
Unix
4NF : A relation schema R is in 4NF with respect to a set of

dependencies F (that includes functional dependencies and
multivalued dependencies) if, for every nontrivial multivalued
dependency X >> Y in F+, X is a superkey for R.
A relation R is in 4NF if for all MVD in D+ of the form A->>B at least one of the following hold
A ->> B is a trivial MVD
A is a superkey
Example:
Decomposing a relation state of EMP that is not in 4NF:
(a) EMP relation with additional tuples.

(b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Dependency Preservation
If f is an FD in F, but f is not in F1 F2, there are two possibilities:
f (F1F2)+
If the constraints in F1 and F2 are maintained, f will be maintained

automatically.
f not in (F1F2)+
f can be checked only by first taking the join of r 1 and r2. This is costly.
Example
Schema (R, F) where
R = {SSN, Name, Address, Hobby}
F = {SSN Name, Address}
can be decomposed into
R1 = {SSN, Name, Address} F1 = {SSN Name, Address}
and R2 = {SSN, Hobby} F2 = { }
Since F = F1 F2 the decomposition is dependency preserving
Example
Schema: (ABC; F) with F = {A B, B C, C B}
Decomposition:
(AC, F1), F1 = {A C} Note: A C not in F, but is in F+
(BC, F2), F2 = {B C, C B}
A B not in (F1 F2), but A B (F1 F2)+.
So F+ = (F1 F2)+ and thus the decompositions is still dependency preserving
JOIN DEPENDENCY
A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R,
specifies a constraint on the states r of R.
The constraint states that every legal state r of R should have a non-additive join
decomposition into R1, R2, ..., Rn; that is, for every such r we have
* (pR1(r), pR2(r), ..., pRn(r)) = r
Note: an MVD is a special case of a JD where n = 2.
A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if
one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R.
5NF : A relation schema R is in fifth normal form (5NF) (or

Project-Join Normal Form (PJNF)) with respect to a set F of
functional, multivalued, and join dependencies if, for every
nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is,
implied by F), every Ri is a superkey of R.
In the above the supply relation is decomposed into 3 relations which are now in 5NF. If we
apply a join on any two relations it will produce spurious tuples. But applying a join to all 3
together will not do so. Which means there is a join dependency between the 3 relations
Minimal Cover
A minimal cover of a set of dependencies, T, is a set of dependencies, U, such that:
U is equivalent to T (T+ = U+)
All FDs in U have the form X A where A is a single attribute
It is not possible to make U smaller (while preserving equivalence) by

Deleting an FD
Deleting an attribute from an FD (either from LHS or RHS)
FDs and attributes that can be deleted in this way are called redundant
Computing Minimal Cover

Example: T = {ABH CK, A D, C E, BGH F, F AD, E F, BH E}
step 1: Make RHS of each FD into a single attribute
Algorithm: Use the decomposition inference rule for FDs
Example: F AD replaced by F A, F D ; ABH CK by ABH C, ABH K
step 2: Eliminate redundant attributes from LHS.
Algorithm: If FD XB A T (where B is a single attribute) and X A is

entailed by T, then B was unnecessary
Example: Can an attribute be deleted from ABH C ?

Compute AB+T, AH+T, BH+T.
Since C (BH)+T , BH C is entailed by T and A is redundant in ABH C.
step 3: Delete redundant FDs from T
Algorithm: If T - {f} entails f, then f is redundant
If f is X A then check if A X+T-{f}
Example: BGH F is entailed by E F, BH E, so it is redundant
Note: Steps 2 and 3 cannot be reversed!! See the textbook for a counterexample
Domain-Key Normal Form (DKNF):

Definition:
A relation schema is said to be in DKNF if all constraints and dependencies that

should hold on the valid relation states can be enforced simply by enforcing the
domain constraints and key constraints on the relation.
The idea is to specify (theoretically, at least) the ultimate normal form that takes into
account all possible types of dependencies and constraints. .
For a relation in DKNF, it becomes very straightforward to enforce all database
constraints by simply checking that each attribute value in a tuple is of the appropriate
domain and that every key constraint is enforced.
The practical utility of DKNF is limited
Normalization Drawbacks
By limiting redundancy, normalization helps maintain consistency and saves space
But performance of querying can suffer because related information that was stored in a single
relation is now distributed among several
Example: A join is required to get the names and grades of all students taking CS305 in S2002.
Denormalization
Tradeoff: Judiciously introduce redundancy to improve performance of certain queries
Example: Add attribute Name to Transcript
SELECT T.Name, T.Grad
FROM Transcript T
WHERE T.CrsCode = CS305 AND T.Semester = S2002
Join is avoided
If queries are asked more frequently than Transcript is modified, added

redundancy might improve average performance
But, Transcript is no longer in BCNF since key is (StudId, CrsCode, Semester)

and StudId Name
Normalization Problems
1. Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesman#, Commision%, Discount_amt
Assume that a car may be sold by multiple salesmen and hence {CAR#, SALESMAN#} is the
primary key. Additional dependencies are:
Date_sold ->Discount_amt
and
Salesman# ->commission%
Based on the given primary key, is this relation in 1NF, 2NF, or 3NF? Why or why not? How would
you successively normalize it completely?
Answer:
Given the relation schema
Car_Sale(Car#, Salesman#, Date_sold, Commission%, Discount_amt)
with the functional dependencies
Date_sold Discount_amt
Salesman# Commission%
Car# Date_sold
This relation satisfies 1NF but not 2NF (Car# Date_sold and Car#
Discount_amt
so these two attributes are not FFD on the primary key) and not 3NF.
To normalize,
2NF:
Car_Sale1(Car#, Date_sold, Discount_amt)
Car_Sale2(Car#, Salesman#)
Car_Sale3(Salesman#,Commission%)
3NF:
Car_Sale1-1(Car#, Date_sold)
Car_Sale1-2(Date_sold, Discount_amt)
Car_Sale2(Car#, Salesman#)
Car_Sale3(Salesman#,Commission%)
2. Consider the following relation for published books:
BOOK (Book_title, Authorname, Book_type, Listprice, Author_affil, Publisher)
Author_affil referes to the affiliation of the author. Suppose the following dependencies exist:
Book_title -> Publisher, Book_type
Book_type -> Listprice
Author_name -> Author-affil
(a) What normal form is the relation in? Explain your answer.
(b) Apply normalization until you cannot decompose the relations further. State the reasons behind
each decomposition.
Answer:
Given the relation
Book(Book_title, Authorname, Book_type, Listprice, Author_affil, Publisher)
and the FDs
Book_title Publisher, Book_type
Book_type Listprice
Authorname Author_affil
(a)The key for this relation is Book_title,Authorname. This relation is in 1NF and not in
2NF as no attributes are FFD on the key. It is also not in 3NF.
(b) 2NF decomposition:
Book0(Book_title, Authorname)
Book1(Book_title, Publisher, Book_type, Listprice)
Book2(Authorname, Author_affil)
This decomposition eliminates the partial dependencies.
3NF decomposition:
Book0(Book_title, Authorname)
Book1-1(Book_title, Publisher, Book_type)
Book1-2(Book_type, Listprice)
Book2(Authorname, Author_affil)
This decomposition eliminates the transitive dependency of Listprice
3. Consider the following relation:
R (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)
In this relation, a tuple describes a visit of a patient to a doctor along with a treatment code and daily
charge. Assume that diagnosis is determined (uniquely) for each patient by a doctor. Assume that
each treatment code has a fixed charge (regardless of patient). Is this relation in 2NF? Justify your
answer and decompose if necessary. Then argue whether further normalization to 3NF is
necessary, and if so, perform it.
Answer:
From the questions text, we can infer the following functional dependencies:
{Doctor#, Patient#, Date}{Diagnosis, Treat_code, Charge}
{Treat_code}{Charge}
Because there are no partial dependencies, the given relation is in 2NF already. This however is
not 3NF because the Charge is a nonkey attribute that is determined by another nonkey attribute,
Treat_code. We must decompose further:
R (Doctor#, Patient#, Date, Diagnosis, Treat_code)
R1 (Treat_code, Charge)
We could further infer that the treatment for a given diagnosis is functionally dependant, but we
should be sure to allow the doctor to have some flexibility when prescribing cures.
References:
1. Database Management Systems By Dr. V.K. Jain
2. Elmasri, R., & Navathe, S. (1994). Fundamentals of Database Systems.
3. Codd, E. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By
the Rules?" ComputerWorld, October 14 and October 21. Elmasri, R., & Navathe, S.
(1994). Fundamentals of Database Systems. 2nd ed. Redwood City, CA: The
Benjamin/Cummings Publishing Co. pp. 283 285.
4. http://jcsites.juniata.edu/faculty/rhodes/dbms/relnmodel.htm

Unit 1 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 Notes

Uploaded by

Copyright:

Available Formats

UNIT I

INTRODUCTION TO DBMS - Syllabus

represents some aspect of the real world (universe of discourse) generally

organized to reflect relationships among the data

mirrors the state of the company/organization/enterprise, an asset in its own

Database Management System

1. Define and modify the database structure

Operations on the database are referred to as transactions.

Database Approaches / History

separate files, each with a tabular organization

historically used punched cards and/or tapes

Hierarchical (tree organization of data )

earliest approach of integrated data, IBM

Network (linked lists, directed graphs)

Efficient storage and retrieval

Complex design and navigation

Developed by CODASYL (Committe on Data Systems and Languages) which

today the network approach is found in object-oriented databases (OODB)

Relational database (primary approach today)

Tables (relations) of rows (tuples) and columns (attributes)

Tables and attributes are named

Relationships between tables are established by common values

Mathematically based on set theory

Object oriented (OODB)

Embedded in Java or C++ (extension of OO)

Unifies object heap space in memory and secondary storage

Return to network approach

Modeling Design of databases

entity-relationship (ER) modeling

Unified Modeling Languae (UML)

Characteristics of modern database systems

Applications such as airline reservation systems are known as online transaction

5. Providing Backup and Recovery: The subsystem having this responsibility

Database Users and their responsibilites

manage resources and other users

authorization/security control to database

coordinating and monitoring its use

acquiring software resources and hardware resources as needed

specifies structure of data that will be stored in database

Identifying the data to be stored

Applications programmers -- implements application programs (transactions) that access data

Naive/Parametric end users: Typically the biggest group of users; frequently

Stand-alone users: Use "personal" databases, possibly employing a special-purpose

Workers Behind the Scene

DBMS system designers/implementors: provide the DBMS software that

tool developers: design and implement software tools facilitating database

operators and maintenance personnel: responsible for the day-to-day

Three Level Database Architecture

We will be concerned with three forms of schemas:

Physical Data Level

Routines are hardcoded to deal with physical representation.

Changes to data structures are difficult to make.

Application code becomes complex since it must deal with details.

Rapid implementation of new features very difficult.

Conceptual Data Level

In the relational model, the conceptual schema presents data as a set of

Physical schema can be changed without changing application:

DBMS must change mapping from conceptual to physical.

Referred to as physical data independence.

External Data Level

Students should not see faculty salaries.

Faculty should not see billing or payment data.

GPA not stored, calculated when needed.