You are on page 1of 33

Database Management System

Conceptual View

11/25/2008
Institute of Management Sciences
Muhammad Atif Nasim
Table of Contents
Database .......................................................................................................................... 2
Database Management Systems ....................................................................................... 2
Uses of databases ............................................................................................................. 2
Type of Databases ............................................................................................................ 2
Delimited text files .............................................................................................................................. 2
Comma-separated variable (CSV) files................................................................................................ 3
Locking ................................................................................................................................................ 3
Complex data ...................................................................................................................................... 3
Efficiency ............................................................................................................................................. 3
Hierarchical Database Definition ........................................................................................................ 4
Network model ................................................................................................................................... 4
Relational Database ............................................................................................................................ 4
Object-Oriented Database Definition ................................................................................................. 5
Tables and relationships ................................................................................................... 5
Entity-Relationship Diagrams (ERD) .................................................................................. 8
Data Flow Diagram (DFD) ............................................................................................... 13
Guidelines ......................................................................................................................................... 13
Decomposition .................................................................................................................................. 13
Symbols ............................................................................................................................................. 15
Data process ............................................................................................................................ 15
Data store ................................................................................................................................ 15
Actor ......................................................................................................................................... 16
Anchor ...................................................................................................................................... 16
Data flow.................................................................................................................................. 16
Control flow.............................................................................................................................. 16
Update flow.............................................................................................................................. 16
Flow names and inheritance .................................................................................................... 17
Data Flow Diagram Layers ....................................................................................................... 19
Context Diagrams .................................................................................................................... 20
DFD levels ................................................................................................................................. 20
Key ................................................................................................................................. 21
Primary key ....................................................................................................................................... 21
Secondary/Foreign key ..................................................................................................................... 21
Database Normalization ................................................................................................. 21
1. Eliminate Repeating Groups ......................................................................................................... 24
2. Eliminate Redundant Data ............................................................................................................ 25
3. Eliminate Columns Not Dependent On Key .................................................................................. 26
BCNF. Boyce-Codd Normal Form.............................................................................................. 26
4. Isolate Independent Multiple Relationships ................................................................................. 27
5. Isolate Semantically Related Multiple Relationships .................................................................... 28
6. Optimal Normal Form ................................................................................................................... 29
7. Domain-Key Normal Form ............................................................................................................ 29
Components of DBMS..................................................................................................... 30
Data dictionary/directory ................................................................................................................. 30
Data languages .................................................................................................................................. 30
Teleprocessing monitors ................................................................................................................... 31
Application development system ..................................................................................................... 31
Security software .............................................................................................................................. 31
Archiving and recovery system ......................................................................................................... 31
Report writers ................................................................................................................................... 31
SQL and other Query languages ....................................................................................................... 31
Data Redundancy ........................................................................................................... 21
Data Integrity ................................................................................................................. 21
Cascade Updates and Deletes .................................................................................................. 22
Business Rules and Levels of Enforcement ............................................................................... 22
Field Level Integrity .................................................................................................................. 22
Table Level Integrity ................................................................................................................. 23
Validation Tables ..................................................................................................................... 23
Database
A database is a collection of related information in organize manner. The data stored in a
database is Constant.

Database Management Systems


A database management system (DBMS) is software or a collection of software which can be
used to create, maintain and work with databases.

A client/server database system is one in which the database is stored and managed by a
database server, and client software is used to request information from the server or to send
commands to the server.

Uses of databases
Databases are commonly used to store bodies of data which are too large to be managed on
paper or through simple spreadsheets.

Most businesses use databases for accounts, inventory, personnel, and other record keeping.

Databases are also becoming more widely used by home users for address books, cd
collections, recipe archives, etc.

There are very few fields in which databases cannot be used.

Type of Databases
• Flat-file text databases

• Hierarchical databases such as LDAP

• Network databases

• Relational databases

• Object Oriented databases

Delimited text files

A delimited text file is one in which each line of text is a record, and the fields are separated
by a known character. The character used to delimit the data varies according to the type of
data. Common delimiters include the tab character (\t in Perl) or various punctuation
characters. The delimiter should always be one which does not appear in the data.

Delimited text files are easily produced by most desktop spreadsheet and database
applications (eg Microsoft Excel, Microsoft Access). You can usually choose "File" then
"Save As" or "Export", then select the type of file you would like to save as.

2|Page
Imagine a file which contains peoples' given names, surnames, and ages, delimited by the
pipe (|) symbol:

Fred|Flintstone|40
Wilma|Flintstone|36
Barney|Rubble|38
Betty|Rubble|34
Homer|Simpson|45
Marge|Simpson|39
Bart|Simpson|11
Lisa|Simpson|9

The file above is available in your exercises directory as delimited.txt.

Comma-separated variable (CSV) files

Comma separated variable files are another format commonly produced by spreadsheet and
database programs. CSV files delimit their fields with commas, and wrap textual data in
quotation marks, allowing the textual data to contain commas if required:

"Fred","Flintstone",40
"Wilma","Flintstone",36
"Barney","Rubble",38
"Betty","Rubble",34
"Homer","Simpson",45
"Marge","Simpson",39
"Bart","Simpson",11
"Lisa","Simpson",9

CSV files are harder to parse than ordinary delimited text files. The best way to parse them is
to use the Text::ParseWords module:

Problems with flat file databases

Locking

When using flat file databases without locking, problems can occur if two or more people
open the files at the same time. This can cause data to be lost or corrupted.

If you are implementing a flat file database, you will need to handle file locking using Perl's
flock function.

Complex data

If your data is more complex than a single table of scalar items, managing your flat file
database can become extremely tedious and difficult.

Efficiency

Flat file databases are very inefficient for large quantities of data. Searching, sorting, and
other simple activities can take a very long time and use a great deal of memory and other
system resources.

3|Page
Hierarchical Database Definition

A kind of {database management system} that links records together like a family tree such
that each record type has only one owner, e.g. an order is owned by only one customer.
Hierarchical structures were widely used in the first {mainframe} database management
systems. However, due to their restrictions, they often cannot be used to relate structures that
exist in the real world.

Network model

The network model is a database model conceived as a flexible way of representing objects
and their relationships. Its original inventor was Charles Bachman, and it was developed into
a standard specification published in 1969 by the CODASYL Consortium. Where the
hierarchical model structures data as a tree of records, with each record having one parent
record and many children, the network model allows each record to have multiple parent and
child records, forming a lattice structure.

The chief argument in favour of the network model, in comparison to the hierarchic model,
was that it allowed a more natural modelling of relationships between entities. Although the
model was widely implemented and used, it failed to become dominant for two main reasons.
Firstly, IBM chose to stick to the hierarchical model with semi-network extensions in their
established products such as IMS and DL/I. Secondly, it was eventually displaced by the
relational model, which offered a higher-level, more declarative interface. Until the early
1980s the performance benefits of the low-level navigational interfaces offered by
hierarchical and network databases were persuasive for many large-scale applications, but as
hardware became faster, the extra productivity and flexibility of the relational model led to
the gradual obsolescence of the network model in corporate enterprise usage

Relational Database

• A relational database is a collection of data items organized as a set of formally-


described tables from which data can be accessed or reassembled in many different
ways without having to reorganize the database tables. The relational database was
invented by E. F. Codd at IBM in 1970.

• The standard user and application program interface to a relational database is the
structured query language (SQL). SQL statements are used both for interactive
queries for information from a relational database and for gathering data for reports.

• In addition to being relatively easy to create and access, a relational database has the
important advantage of being easy to extend. After the original database creation, a
new data category can be added without requiring that all existing applications be
modified.

• A relational database is a set of tables containing data fitted into predefined


categories. Each table (which is sometimes called a relation) contains one or more
data categories in columns. Each row contains a unique instance of data for the
categories defined by the columns. For example, a typical business order entry
database would include a table that described a customer with columns for name,
address, phone number, and so forth. Another table would describe an order: product,

4|Page
customer, date, sales price, and so forth. A user of the database could obtain a view of
the database that fitted the user's needs. For example, a branch office manager might
like a view or report on all customers that had bought products after a certain date. A
financial services manager in the same company could, from the same tables, obtain a
report on accounts that needed to be paid.

• When creating a relational database, you can define the domain of possible values in a
data column and further constraints that may apply to that data value. For example, a
domain of possible customers could allow up to ten possible customer names but be
constrained in one table to allowing only three of these customer names to be
specifiable.

• The definition of a relational database results in a table of metadata or formal


descriptions of the tables, columns, domains, and constraints.

Object-Oriented Database Definition

(OODB) A system offering {DBMS} facilities in an {object-oriented programming}


environment. Data is stored as {objects} and can be interpreted only using the {method}s
specified by its {class}. The relationship between similar objects is preserved ({inheritance})
as are references between objects. Queries can be faster because {joins} are often not needed
(as in a {relational database}). This is because an object can be retrieved directly without a
search, by following its object ID. The same programming language can be used for both data
definition and data manipulation. The full power of the database programming language's
{type system} can be used to model {data structures} and the relationship between the
different data items. {Multimedia} {applications} are facilitated because the {class}
{method}s associated with the data are responsible for its correct interpretation. OODBs
typically provide better support for {versioning}. An object can be viewed as the set of all its
versions. Also, object versions can be treated as fully fledged objects. OODBs also provide
systematic support for {triggers} and {constraints} which are the basis of {active databases}.
Most, if not all, object-oriented {application programs} that have database needs will benefit
from using an OODB. {Ode} is an example of an OODB built on {C++}.

Tables and relationships


In a relational database, data is stored in tables. Each table contains data about a particular
type of entity (either physical or conceptual).

For instance, our sample database is the inventory and sales system for Acme Widget Co. It
has tables containing data for the following entities:

Table 4-1. Acme Widget Co Tables

Table Description
stock_item Inventory items
customer Customer account details
saleperson Sales people working for Acme Widget Co.
Sales Sales events which occur

5|Page
Tables in a database contain fields and records. Each record describes one entity. Each field
describes a single item of data for that entity. You can think of it like a spreadsheet, with the
rows being the records and the columns being the fields, thus:

Table 4-2. Sample table

ID number Description Price Quantity in stock


1 widget $9.95 12
2 gadget $3.27 20

Every table must have a primary key, which is a field which uniquely identifies the record. In
the example above, the Stock ID number is the primary key.

The following figures show the tables used in our database, along with their field names and
primary keys (in bold type).

Table 4-3. the stock_item table

stock_item
Id
Description
Price
Quantity

Table 4-4. the customer table

Customer
Id
Name
Address
Suburb
State
Postcode

Table 4-5. the salesperson table

salesperson
Id
Name

Table 4-6. the sales table

Sales

6|Page
Id
sale_date
salesperson_id
customer_id
stock_item_id
quantity
Price

• A database table contains fields and records of data about one entity
• SQL (Structured Query Language) can be used to manipulate and retrieve data in a
database
• A SELECT query may be used to retrieve records which match certain criteria
• An INSERT query may be used to add new records to the database
• A DELETE query may be used to delete records from the database
• An UPDATE query may be used to modify records in the database
• A CREATE query may be used to create new tables in the database
• A DROP query may be used to remove tables from the database

7|Page
Entity-Relationship Diagrams (ERD)
Data models are tools used in analysis to describe the data requirements and assumptions in the
system from a top-down perspective. They also set the stage for the design of databases later on in the
SDLC.

There are three basic elements in ER models:

Entities are the "things" about which we seek information.

Attributes are the data we collect about the entities.

Relationships provide the structure needed to draw information from multiple entities.

Generally, ERD's look like this:

8|Page
Developing an ERD

Developing an ERD requires an understanding of the system and its components. Before
discussing the procedure, let's look at a narrative created by Professor Harman.

Consider a hospital:
Patients are treated in a single ward by the doctors assigned to them. Usually each patient will
be assigned a single doctor, but in rare cases they will have two. Heathcare assistants also
attend to the patients, a number of these are associated with each ward.
Initially the system will be concerned solely with drug treatment. Each patient is required to
take a variety of drugs a certain number of times per day and for varying lengths of time.
The system must record details concerning patient treatment and staff payment. Some staff
are paid part time and doctors and care assistants work varying amounts of overtime at
varying rates (subject to grade).
The system will also need to track what treatments are required for which patients and when
and it should be capable of calculating the cost of treatment per week for each patient (though
it is currently unclear to what use this information will be put).

How do we start an ERD?

1. Define Entities: these are usually nouns used in descriptions of the system, in the
discussion of business rules, or in documentation; identified in the narrative (see
highlighted items above).

2. Define Relationships: these are usually verbs used in descriptions of the system or
in discussion of the business rules (entity ______ entity); identified in the narrative
(see highlighted items above).

9|Page
Fully attributed ERD with keys

3. Add attributes to the relations; these are determined by the queries,and may also suggest
new entities, e.g. grade; or they may suggest the need for keys or identifiers.

What questions can we ask?


a. Which doctors work in which wards?
b. How much will be spent in a ward in a given week?
c. How much will a patient cost to treat?
d. How much does a doctor cost per week?
e. Which assistants can a patient expect to see?
f. Which drugs are being used?
4. Add cardinality to the relations
Many-to-Many must be resolved to two one-to-manys with an additional entity
Usually automatically happens
Sometimes involves introduction of a link entity (which will be all foreign key) Examples:
Patient-Drug
5. This flexibility allows us to consider a variety of questions such as:
a. Which beds are free?
b. Which assistants work for Dr. X?
c. What is the least expensive prescription?
d. How many doctors are there in the hospital?
e. Which patients are family related?

10 | P a g e
6. Represent that information with symbols. Generally E-R Diagrams require the use of the
following symbols:

Reading an ERD

It takes some practice reading an ERD, but they can be used with clients to discuss business
rules.

These allow us to represent the information from above such as the E-R Diagram below:

11 | P a g e
ERD brings out issues:
Many-to-Manys
Ambiguities
Entities and their relationships
What data needs to be stored
The Degree of a relationship

Now, think about a university in terms of an ERD. What entities, relationships and attributes
might you consider? Look at this simplified view. There is also an example of a simplified
view of an airline on that page.

12 | P a g e
Data Flow Diagram (DFD)
The DFDs show the flow of data values from their sources in objects through the processes
that transform them to their destination in other objects. Values can include input values,
output values, and internal data stores. Control information is shown only in the form of
control flows.

The following table lists the important elements of DFDs.

Symbol Stands For

Data process Data processing

Data flow Data flow or the exchange of data between processes

Data store Data storage

Actor Object producing and consuming data

Guidelines

You can follow certain guidelines to draw meaningful DFDs.

• Optional input flows do not exist. A process can perform its function only if all its
input flows are always available.
• You cannot assign the same data to two output flows from the same process. If a
process produces more than one data flow, these flows are mutually exclusive.
• You can split a flow, and you can merge two flows into one.

Decomposition

To specify what a high-level process does, break it down into smaller units in more DFDs. A
high-level process is an entire DFD. Each high-level process is decomposed into other
processes with data flows and data stores. Each decomposition is a DFD in itself. You can
continue to break down processes until you reach a level on which further decomposition
seems impossible or meaningless.

The data flows of the opened process are connected in the new diagram to the process related
to the opened process. Vertices, and the flows and objects connected to them, are transferred
with the flows that are connected to the decomposed process.

13 | P a g e
Example DFD

The following illustration shows a sample DFD.

14 | P a g e
Symbols
Data process

A data process transforms data values.

You can make a distinction between the following types of processes:

Process Type Indicates

Process containing nonfunctional components such as data stores or external


High-level
objects that cause side effects

Low-level Pure function without side effects, such as the sum of two numbers

Leaf or atomic
Process that is not further decomposed
processes

The name of a process is usually a description of the transformation it performs.

There are three sorts of transformation:

• Transformation of the structure, for example, reformatting


• Transformation of information contained in data
• Generation of new information

If you open a process, you can either create a new DFD or open an existing DFD in which the
process is specified.

The data flows of the opened process are connected in the new diagram to the process with
the name of the opened process. Vertices, and the flows and objects connected to them, are
transferred with the flows that are connected to the decomposed process.

If a data process has a decomposition at a lower level, an asterisk is placed inside the ellipse.
The data process can be opened only if it has a name.

Data store

A data store stores data passively for later access. A data store responds to requests to store
and access data. It does not generate any operations. A data store allows values to be accessed
in an order different from the order in which they were generated.

Input flows indicate information or operations that modify the stored data such as adding or
deleting elements or changing values. Output flows indicate information retrieved from the
store; this information can be an entire value or a component of a value.

15 | P a g e
Actor

An actor produces and consumes data, driving the DFD. Actors lie on the boundary of the
diagram; they terminate the flow of data as sources and sinks of data. They are also known as
terminators. Data flows between an actor and a diagram are inputs to and outputs of the
diagram. The system interacts with people through the actor.

Anchor

A DFD anchor provides a start or end point. In decomposition diagrams, anchors represent
the nodes connected to the decomposed process in the higher level diagram.

Data flow

A data flow moves data between processes or between processes and data stores. As such, it
represents a data value at some point within a computation and an intermediate value within a
computation if the flow is internal to the diagram. This value is not changed.

The names of input and output flows can indicate their roles in the computation or the type of
the value they move. Data names are preferably nouns. The name of a typical piece of data,
the data aspect, is written alongside the arrow.

Control flow

A control flow is a signal that carries out a command or indicates that something has
occurred. A control flow occurs at a discrete point in time. The arrow indicates the direction
of the control flow. The name of the event is written beside the arrow.

Control flows can correspond to messages in CCDs or events in STDs; however, because
they duplicate information in the DFD, use them sparingly.

Update flow

Update (or bidirectional) flows are used to indicate an update of a data store, that is, a read,
change, and store operation on a data flow.

16 | P a g e
Flow names and inheritance

Flows in DFDs must be named. However, flows can inherit the names of the objects they are
connected to. The table below shows the rules for inheritance of names. These rules are
applied in the order shown, until nothing more can be inherited. In some situations, the flow's
inherited name causes an error when a Check command is carried out. The result of the
inheritance is confusing in the diagram.

Original Situation After


Explanation
Situation Inheritance

Diverging flows without names inherit the name of an incoming


flow with a name. If the incoming flow has several names, each
diverging flow inherits all of them.

Converging flows without names inherit the name of an outgoing


flow with a name. If the outgoing flow has several names, each
converging flow inherits all of them.

Flows connected to a data store, control store, message queue,


message box, event queue, or event flag inherit the name of that
node.

A forked (converging or diverging) data flow is either a split or merging data flow, or a
composite data flow. A composite data flow has one name for each branch. A composite flow
can split into the original flows again. A split or a merging data flow has only one name.

The name of the flow is taken as type name if no data type is specified

17 | P a g e
Process Notations

Yourdon and Coad


Process Notations

Gane and Sarson


Process Notation

Datastore Notations

Yourdon and Coad


Datastore Notations

Gane and Sarson


Datastore Notations

Dataflow Notations

External Entity Notations

18 | P a g e
Data Flow Diagram Layers
Draw data flow diagrams in several nested layers. A single process node on a high level
diagram can be expanded to show a more detailed data flow diagram. Draw the context
diagram first, followed by various layers of data flow diagrams.

The nesting of data flow layers

19 | P a g e
Context Diagrams

A context diagram is a top level (also known as Level 0) data flow diagram. It only contains
one process node (process 0) that generalizes the function of the entire system in relationship
to external entities.

DFD levels

The first level DFD shows the main processes within the system. Each of these processes can
be broken into further processes until you reach pseudocode.

An example first-level data flow diagram

20 | P a g e
Key
Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a
number of possible unique keys. A primary key is a key which the database designer has
designated for this purpose. Primary Key identifies the whole record.

Secondary/Foreign key
Secondary key is a key which reference to the Primary key which exists in the other table. It
is necessary to make the relationships.

Data Redundancy
Data Redundancy refers to a data organization act that duplicates your unnecessary data within the
database.. To make any changes or modification in the redundant data, you are supposed to make
changes in the multiple fields of the database. While this is a general behaviour for Spreadsheet and
Flat File Database structure, it overwhelms the function of relational
relational database structure.

The data connections should allow you to keep up and maintain just one data field, only at one
location, and make the database the main relational model that would be responsible for any changes,
across the data base. The redundant database utilizes lot of place unnecessarily and also creates
problem for the maintenance of the database.

The database software removes the data redundancy by centralizing the data into one database and all
the application can access the same data

Data Integrity
The
he database designer is responsible for incorporating elements to promote the accuracy and
reliability of stored data within the database. There are many different techniques that can be
used to encourage data integrity,
integrity with some of these dependants on whatat database technology

21 | P a g e
is being used. There are different types of data integrity techniques available whilst working
with Microsoft Access:

1. Referential Integrity
2. Cascade Updates & Deletes
3. Table Level Integrity
1. Field Comparisons
2. Validation Tables
Referential Integrity - part of the definition of a true relational database product is that it
supports referential integrity. Referential Integrity principles may be stated by:

"Every non-null foreign key value must match an existing primary key value"

If a value exists in the foreign key field of a table, then there must be a matching value in the
primary key field of the table to which it is related. Referential Integrity is all about
preserving the validity of the foreign key values.

Cascade Updates and Deletes


As with anything in the real world, things can alter and you will need to ensure that the
database can cope with this. Code names such as DepartmentCode will get revised, and
departments can close or merge, therefore we need to be able to maintain the data when
changes required will violate referential integrity rules.
RDBMS products generally handle these changes through cascading updates and deletes
(different products may handle this differently, and have different names and techniques for
this). In some database products you may need to create rules or triggers or use an operator.

Business Rules and Levels of Enforcement


Referential Integrity is enforced at the database level, in that it controls the integrity of the
data between tables. As the database designer, you can also do things at both field and table
levels to help ensure data integrity. Business rules should be implemented to ensure that the
data entered meets the requirements of a particular setting for the database.
Business rules should be documented as they are implemented. This should detail each rule,
where and how it is implemented and enforced within the database design. Over time these
rules may change, and having each and every rule documented will make it much easier to
find and modify the design.
As you implement a rule, it is important that each one is tested. Does the rule give the
intended result? What happens when the rule is violated?
Good application design will also give the user feedback (messages) when a rule is broken,
and allow them to rectify any changes they were making.
Field Level Integrity
Using Field Properties - Each of the fields that are contained in the database has properties
associated with it. These properties may be referred to as elements or attributes of the field.
These enable you, as the database designer, to place constraints on the values that may be
entered into that field.
Data Types - the most obvious constraint that can be placed on the fields in your database
will be done with the selection of a data type for the field. Data types may vary by RDBMS,

22 | P a g e
however in general they will be pretty much the same; usually, you will also be able to create
custom data types through code.
As you begin to collect information regarding the design of the database, you will be defining
what types of data can, or should be entered into the fields that you define.
 A number or numeric data type will only allow the entry of numbers and should be used for
most fields on which calculations will be performed; it will however drop leading zeros and may
occasionally encounter rounding errors.
 A currency data type can eliminate rounding errors, but may not be as accurate as to the many
digits that a number data type can contain.
 A text field can contain basically anything, but may be limited to a certain number of
characters. It can be used for numeric data on fields where no calculations will be required, or
where the data needs to retain a leading zero(s).
 Memo data types, if available, will allow for a much larger number of characters.
 Date/Time fields are restricted to only allowing valid dates and times.
 A Boolean (Yes/No data type in Microsoft Access) will permit the entry of only one of two
values - yes/no, true/false or on/off.
Most of these data types can also be restricted further by setting allowable sizes (some may
already have default values that cannot be changed). Some of the data types may also allow
you to define a format, for example the amount of decimal places.
Table Level Integrity
Field Comparisons - Database tables also have properties that you can use to set a validation
rule on records in the table. By doing this, you can set a rule that compares the value of one
field in the record to that of another value, in another field, in the same record. This rule is
run before the record is saved.
An example of this would be to compare dates, as part of your business rules. You business
may have a rule in place that a OrderDespatchDate must be no more than 3 days after the
OrderPlacedDate. The rule would look something like:
OrderDespatchDate <= OrderPlacedDate + 3
If the rule is violated, an error message can be displayed, and the data must be amended
before the record can be saved.
Validation Tables
A validation table is created to promote data integrity. Normally, a validation table will
consist of two fields; one is the primary key, and the other holds the values used by some
other field in the database. The validation table normally will hold a static set of values,
enabling you to store a master set of values in one location and, by referencing those values
instead of entering values directly into a field, you can ensure consistent values are used.

Database Normalization
Database normalization, sometimes referred to as canonical synthesis, is a technique for
designing relational database tables to minimize duplication of information and, in so doing,
to safeguard the database against certain types of logical or structural problems, namely data
anomalies. For example, when multiple instances of a given piece of information occur in a
table, the possibility exists that these instances will not be kept consistent when the data
within the table is updated, leading to a loss of data integrity. A table that is sufficiently
normalized is less vulnerable to problems of this kind, because its structure reflects the basic
assumptions for when multiple instances of the same information should be represented by a
single instance only.

23 | P a g e
Higher degrees of normalization typically involve more tables and create the need for a larger
number of joins, which can reduce performance. Accordingly, more highly normalized tables
are typically used in database applications involving many isolated transactions (e.g. an
Automated teller machine), while less normalized tables tend to be used in database
applications that need to map complex relationships between data entities and data attributes
(e.g. a reporting application, or a full-text search application).

Database theory describes a table's degree of normalization in terms of normal forms of


successively higher degrees of strictness. A table in third normal form (3NF), for example, is
consequently in second normal form (2NF) as well; but the reverse is not always the case.

Although the normal forms are often defined informally in terms of the characteristics of
tables, rigorous definitions of the normal forms are concerned with the characteristics of
mathematical constructs known as relations. Whenever information is represented
relationally, it is meaningful to consider the extent to which the representation is normalized.

1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and
give each table a primary key.

2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove
it to a separate table.

3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of
the key, remove them to a separate table.

BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate key
attributes, separate them out into distinct tables.

4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m
relationships that are not directly related.

5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships.

ONF Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in
Object Role Model notation.

DKNF Domain-Key Normal Form - a model free from all modification anomalies.

1. Eliminate Repeating Groups

In the original member list, each member name is followed by any databases that the member
has experience with. Some might know many, and others might not know any. To answer the
question, "Who knows DB2?" we need to perform an awkward scan of the list looking for
references to DB2. This is inefficient and an extremely untidy way to store information.

Moving the known databases into a seperate table helps a lot. Separating the repeating groups
of databases from the member information results in first normal form. The MemberID in

24 | P a g e
the database table matches the primary key in the member table, providing a foreign key for
relating the two tables with a join operation. Now we can answer the question by looking in
the database table for "DB2" and getting the list of members.

2. Eliminate Redundant Data

In the Database Table, the primary key is made up of the MemberID and the DatabaseID.
This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since
they will be different for every member/database combination. But the database name
depends only on the DatabaseID. The same database name will appear redundantly every
time its associated ID appears in the Database Table.

Suppose you want to reclassify a database - give it a different DatabaseID. The change has to
be made for every member that lists that database! If you miss some, you'll have several
members with the same database under different IDs. This is an update anomaly.

Or suppose the last member listing a particular database leaves the group. His records will be
removed from the system, and the database will not be stored anywhere! This is a delete
anomaly. To avoid these problems, we need second normal form.

To achieve this, separate the attributes depending on both parts of the key from those
depending only on the DatabaseID. This results in two tables: "Database" which gives the
name for each DatabaseID, and "MemberDatabase" which lists the databases for each
member.

Now we can reclassify a database in a single operation: look up the DatabaseID in the
"Database" table and change its name. The result will instantly be available throughout the
application.

25 | P a g e
3. Eliminate Columns Not Dependent On Key

The Member table satisfies first normal form - it contains no repeating groups. It satisfies
second normal form - since it doesn't have a multivalued key. But the key is MemberID, and
the company name and location describe only a company, not a member. To achieve third
normal form, they must be moved into a separate table. Since they describe a company,
CompanyCode becomes the key of the new "Company" table.

The motivation for this is the same for second normal form: we want to avoid update and
delete anomalies. For example, suppose no members from the IBM were currently stored in
the database. With the previous design, there would be no record of its existence, even though
20 past members were from IBM!

BCNF. Boyce-Codd Normal Form

Boyce-Codd Normal Form states mathematically that:


A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X
is a candidate key for R.

BCNF covers very specific situations where 3NF misses inter-dependencies between non-key
(but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF.
However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the
keys are composed of multiple attributes, and (c) there are common attributes between the
keys.

26 | P a g e
Basically, a humorous way to remember BCNF is that all functional dependencies are:
"The key, the whole key, and nothing but the key, so help me Codd."

4. Isolate Independent Multiple Relationships

This applies primarily to key-only associative tables, and appears as a ternary relationship,
but has incorrectly merged 2 distinct, independent relationships.

The way this situation starts is by a business request list the one shown below. This could be
any 2 M:M relationships from a single entity. For instance, a member could know many
software tools, and a software tool may be used by many members. Also, a member could
have recommended many books, and a book could be recommended by many members.

Initial business request

So, to resolve the two M:M relationships, we know that we should resolve them separately,
and that would give us 4th normal form. But, if we were to combine them into a single table,
it might look right (it is in 3rd normal form) at first. This is shown below, and violates 4th
normal form.

Incorrect solution

To get a picture of what is wrong, look at some sample data, shown below. The first few
records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone
to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and
Steve Doesn't know any software tools. Our solution has forced us to do strange things like
create dummy records in both Book and Software to allow the record in the association, since
it is key only table.

27 | P a g e
Sample data from incorrect solution

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M
relationships are resolved independently if they are indeed independent, as shown below.

Correct 4th normal form

NOTE! This is not to say that ALL ternary associations are invalid. The above situation
made it obvious that Books and Software were independently linked to Members. If,
however, there were distinct links between all three, such that we would be stating that "Bill
recommends the ERWin Bible as a reference for ERWin", then separating the relationship
into two separate associations would be incorrect. In that case, we would lose the distinct
information about the 3-way relationship.

5. Isolate Semantically Related Multiple Relationships

OK, now lets modify the original business diagram and add a link between the books and the
software tools, indicating which books deal with which software tools, as shown below.

Initial business request

This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the
multiple M:M relationships into a single association, which would now violate 5th normal
form. The ternary association looks identical to the one shown in the 4th normal form

28 | P a g e
example, and is also going to have trouble displaying the information correctly. This time we
would have even more trouble because we can't show the relationships between books and
software unless we have a member to link to, or we have to add our favorite dummy member
record to allow the record in the association table.

Incorrect solution

The solution, as before, is to ensure that all M:M relationships that are independent are
resolved independently, resulting in the model shown below. Now information about
members and books, members and software, and books and software are all stored
independently, even though they are all very much semantically related. It is very tempting in
many situations to combine the multiple M:M relationships because they are so similar.
Within complex business discussions, the lines can become blurred and the correct solution
not so obvious.

Correct 5th normal form

6. Optimal Normal Form

At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most
people will stop here because this is usually pretty good. However, another modeling style
called Object Role Modeling (ORM) can display relationships that cannot be expressed in
ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form
(OMF)
It is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

7. Domain-Key Normal Form

This level of normalization is simply a model taken to the point where there are no
opportunities for modification anomalies.

29 | P a g e
• "if every constraint on the relation is a logical consequence of the definition of keys and
domains"
• Constraint "a rule governing static values of attributes"
• Key "unique identifier of a tuple"
• Domain "description of an attribute’s allowed values"

1. A relation in DK/NF has no modification anomalies, and conversely.


2. DK/NF is the ultimate normal form; there is no higher normal form related to
modification anomalies
3. Defn: A relation is in DK/NF if every constraint on the relation is a logical
consequence of the definition of keys and domains.
4. Constraint is any rule governing static values of attributes that is precise enough to be
ascertained whether or not it is true
5. E.g. edit rules, intra-relation and inter-relation constraints, functional and multi-
valued dependencies.
6. Not including constraints on changes in data values or time-dependent constraints.
7. Key - the unique identifier of a tuple.
8. Domain: physical and a logical description of an attributes allowed values.
9. Physical description is the format of an attribute.
10. Logical description is a further restriction of the values the domain is allowed
11. Logical consequence: find a constraint on keys and/or domains which, if it is
enforced, means that the desired constraint is also enforced.
12. Bottom line on DK/NF: If every table has a single theme, then all functional
dependencies will be logical consequences of keys. All data value constraints can
them be expressed as domain constraints.
13. Practical consequence: Since keys are enforced by the DBMS and domains are
enforced by edit checks on data input, all modification anomalies can be avoided by
just these two simple measures.

Components of DBMS
Data dictionary/directory
Database management systems, a file that defines the basic organization of a database. A data
dictionary contains a list of all files in the database, the number of records in each file, and
the names and types of each field. Most database management systems keep the data
dictionary hidden from users to prevent them from accidentally destroying its contents.

Data dictionaries do not contain any actual data from the database, only bookkeeping
information for managing it. Without a data dictionary, however, a database management
system cannot access data from the database.

Data languages
To define the entries in the data dictionary special language is used which is known as DDL
(Data Definition Language or Data Description Language). This language is concerned with
the database administrators

30 | P a g e
Teleprocessing monitors
A teleprocessing monitor is a communication software package that manages
communications between the database and remote terminals. The example is a transaction
from a remote terminal to the database. The teleprocessing monitors is normally a part of
DBMS

Application development system


An application development system is a set of programs designed to help programmers in
developing the applications that use the database. Application development system may or
may not be a component of a DBMS. For example “Oracle Forms” is a application
development package shipped with Oracle DBMS

Security software
A security software package provides a variety of tools to protect the database from
unauthorized access. Security software also protects data from getting corrupt of being
damaged. Oracle corporation claims to have absolute security on their DBMS

Archiving and recovery system


Archiving or Backup provides a way to make copies of the database, which can be used in
case the original database records are damaged. Recovery system restores damaged the data
from its copy

Report writers
A report writer allows programmers, managers and other users to design output reports
without writing much code in any programming language

SQL and other Query languages


A query language consists of set of commands used for updating, inserting, and deleting
records from a database. SQL (Structured Query Language) is a standard query language that
has become a standard for about all the Database management systems

31 | P a g e

You might also like