Merged

1
1
OVERVIEW
Topics Covered:
1.1 Database management system
1.2 Data Independence
1.3 Data Abstraction
1.4 Data Models
1.5 DBMS Architecture
1.6 Users of DBMS
1.7 Overview of Conventional Data Models
1.1 DATABASE MANAGEMENT SYSTEM (DBMS)

DEFINITION:A database management system is a collection of
interrelated data and a set of programs to access those data.
Collection of data is referred to as a database.
Primary goal of dbms is to provide a way to store and
retrieve database information that is both convenient and efficient.
Dbms allows us to define structure for storage of information
and also provides mechanism to manipulate this information. Dbms
also provides safety for the information stored despite system
crashes or attempts of authorized access.
Limitations of data processing environment:1) Data redundancy and consistency:- Different files have different
formats of programs written in different programming languages
by different users. So the same information may be duplicated
in several files. It may lead to data inconsistency.
If a customer changes his address, then it may be reflected in
one copy of data but not in the other.
2) Difficulty in accessing data:- The file system environment does
not allow needed data to be retrieved in a convenient and
efficient manner.
3) Data isolation:- Data is scattered in various files; so it gets
isolated because file may be in different formats.
4) Integrity problems:- Data values stored in the database must

satisfy consistency constraints. Problem occurs when
constraints involve several data items from different files.
5) Atomicity problems:- If failure occurs, data must be stored to
constant state that existed prior to failure. For example, if in a
bank account, a person abc is transferring Rs 5000 to the
account of pqr, and abc has withdrawn the money but before it
gets deposited to the pqrs account, the system failure occurs,
then Rs5000 should be deposited back to abcs bank account.
6) Concurrent access anomalies:- Many systems allow multiple
users to update data simultaneously. Concurrent updates
should not result in inconsistent data.
7) Security problems:- Not every user of the database system
should be able to access all data. Data base should be
protected from access by unauthorized users.
1.2 DATA INDEPENDENCE

We can define two types of data independence:
1. Logical data independence:
It is the capacity to change the conceptual schema without
having to change external schemas or application programs. We
may change the conceptual schema to expand the database (by
adding a record type or data item), or to reduce the database (by
removing a record type or data item). In the latter case, external
schemas that refer only to the remaining data should not be
affected. Only the view definition and the mappings need be
changed in a DBMS that supports logical data independence.
Application programs that reference the external schema constructs
must work as before, after the conceptual schema undergoes a
logical reorganization. Changes to constraints can be applied also
to the conceptual schema without affecting the external schemas or
application programs.
2. Physical data independence:
It is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes
to the internal schema may be needed because some physical files
had to be reorganizedfor example, by creating additional access
structuresto improve the performance of retrieval or update. If the
same data as before remains in the database, we should not have
to change the conceptual schema. Whenever we have a multiplelevel DBMS, its catalog must be expanded to include information on
how to map requests and data among the various levels. The
DBMS uses additional software to accomplish these mappings by
referring to the mapping information in the catalog. Data

independence is accomplished because, when the schema is
changed at some level, the schema at the next higher level remains
unchanged; only the mapping between the two levels is changed.
Hence, application programs referring to the higher-level schema
need not be changed.
1.3 DATA ABSTRACTION:

Major purpose of dbms is to provide users with abstract view
of data i.e. the system hides certain details of how the data are
stored and maintained.
Since database system users are not computer trained,
developers hide the complexity from users through 3 levels of
abstraction, to simplify users interaction with the system.
1) Physical level of data abstraction:
This s the lowest level of abstraction which describes how
data are actually stored.
2) Logical level of data abstraction:
This level hides what data are actually stored in the
database and what relationship exists among them.
3) View Level of data abstraction:
View provides security mechanism to prevent user from
accessing certain parts of database.
1.4 DATA MODELS

Many data models have been proposed, and we can
categorize them according to the types of concepts they use to
describe the database structure.
High-level or conceptual data models provide concepts
that are close to the way many users perceive data, whereas lowlevel or physical data models provide concepts that describe the
details of how data is stored in the computer. Concepts provided
by low-level data models are generally meant for computer
specialists, not for typical end users. Between these two extremes
is a class of representational (or implementation) data models,
which provide concepts that may be understood by end users but
that are not too far removed from the way data is organized within
the computer. Representational data models hide some details of
data storage but can be implemented on a computer system in a
direct way. Conceptual data models use concepts such as
entities, attributes, and relationships.
4
An entity represents a real-world object or concept, such as
an employee or a project, that is described in the database. An
attribute represents some property of interest that further
describes an entity, such as the employees name or salary. A
relationship among two or more entities represents an interaction
among the entities, which is explained by the Entity-Relationship
modela popular high-level conceptual data model.
Representational or implementation data models are the
models used most frequently in traditional commercial DBMSs,
and they include the widely-used relational data model, as well as
the so-called legacy data modelsthe network and hierarchical
modelsthat have been widely used in the past.
We can regard object data models as a new family of higherlevel implementation data models that are closer to conceptual
data models.
Object data models are also frequently utilized as highlevel conceptual models, particularly in the software engineering
domain.
Physical data models describe how data is stored in the
computer by representing information such as record formats,
record orderings, and access paths. An access path is a structure
that makes the search for particular database records efficient.
1.5 DBMS ARCHITECTURE
Fig: Three-Schema DBMS Architecture
5
The goal of the three-schema architecture, illustrated in
above Figure, is to separate the user applications and the physical
database. In this architecture, schemas can be defined at the
following three levels:
1. The internal level has an internal schema, which describes
the physical storage structure of the database. The internal
schema uses a physical data model and describes the complete
details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community of
users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data
types, relationships, user operations, and constraints. A high-level
data model or an implementation data model can be used at this
level.
3. The external or view level includes a number of external
schemas or user views. Each external schema describes the part
of the database that a particular user group is interested in and
hides the rest of the database from that user group. A high-level
data model or an implementation data model can be used at this
level.
The three-schema architecture is a convenient tool for the
user to visualize the schema levels in a database system. In most
DBMSs that support user views, external schemas are specified in
the same data model that describes the conceptual-level
information. Some DBMSs allow different data models to be used
at the conceptual and external levels. Notice that the three
schemas are only descriptions of data; the only data that actually
exists is at the physical level. In a DBMS based on the threeschema architecture, each user group refers only to its own
external schema. Hence, the DBMS must transform a request
specified on an external schema into a request against the
conceptual schema, and then into a request on the internal
schema for processing over the stored database. If the request is
database retrieval, the data extracted from the stored database
must be reformatted to match the users external view. The
processes of transforming requests and results between levels are
called mappings. These mappings may be time-consuming, so
some DBMSsespecially those that are meant to support small
databasesdo not support external views. Even in such systems,
however, a certain amount of mapping is necessary to transform
requests between the conceptual and internal levels.
1.6 PEOPLE WHO WORK WITH THE DATABASE:

The people who use the database can be categorized
a) Database users
b) Database administrator (DBA).
a) Database users are of 4 different types:
1) Naive users:
These are the unsophisticated users who interact with the
system by invoking one of the application programs that have been
written previously.
E.g. consider a user who checks for account balance information
over the World Wide Web. Such a user access a form, enters the
account number and password etc. And the application program on
the internet then retrieves the account balance using given account
information which s passed to the user.
2) Application programmers:
These are computer professionals who write application
programs, used to develop user interfaces. The application
programmer uses Rapid Application Development (RAD) toolkit or
special type of programming languages which include special
features to facilitate generation of forms and display of date on
screen.
3) Sophisticated users:
These users interact with the database using database
query language. They submit their query to the query processor.
Then Data Manipulation Language (DML) functions are performed
on the database to retrieve the data. Tools used by these users are
OLAP(Online Analytical Processing) and data mining tools.
4) Specialized users:
These users write specialized database applications to
retrieve data. These applications can be used to retrieve data with
complex data types e.g. graphics data and audio data.
b) Database Administrator (DBA)
A person having who has central control over data and
programs that access the data is called DBA. Following are the
functions of the DBA.
1) Schema definition: DBA creates database schema by
executing Data Definition Language (DDL) statements.
2) Storage structure and access method definition
3) Schema and physical organization modification: If any
changes are to be made in the original schema, to fit the need of
your organization, then these changes are carried out by the DBA.
4) Granting of authorization foe data access: DBA can decide

which parts of data can be accessed by which users. Before any
user access the data, dbms checks which rights are granted to the
user by the DBA.
5) Routine maintenance: DBA has to take periodic backups of the
database, ensure that enough disk space is available to store new
data, ensure that performance of dbms ix not degraded by any
operation carried out by the users.
1.7 OVERVIEW OF CONVENTIONAL DATA MODELS:

1.7.1 Hierarchical Data Model:
One of the most important applications for the earliest
database management systems was production planning for
manufacturing companies. If an automobile manufacturer decided
to produce 10,000 units of one car model and 5,000 units of
another model, it needed to know how many parts to order from its
suppliers. To answer the question, the product (a car) had to be
decomposed into assemblies (engine, body, chassis), which were
decomposed into subassemblies (valves, cylinders, spark plugs),
and then into sub-subassemblies, and so on. Handling this list of
parts, known as a bill of materials, was a job tailor-made for
computers. The bill of materials for a product has a natural
hierarchical structure. To store this data, the hierarchical data
model, illustrated in Figure below was developed. In this model,
each record in the database represented a specific part. The
records had parent/child relationships, linking each part to its
subpart, and so on.
CAR
ENGINE
LEFT
DOOR
HANDLE
BODY
RIGHT
DOOR
WINDOW
CHASALS
ROOF
LOCK
Figure 4-2: A hierarchical bill-of-materials database
To access the data in the database, a program could:

find a particular part by number (such as the left door),
move "down" to the first child (the door handle),
move "up" to its parent (the body), or
move "sideways" to the next child (the right door).
Retrieving the data in a hierarchical database thus required
navigating through the records, moving up, down, and sideways
one record at a time.
One of the most popular hierarchical database management
systems was IBM's
Information Management System (IMS), first introduced in 1968.
The advantages of IMS and its hierarchical model are as follows:
Simple structure: The organization of an IMS database was
easy to understand. The database hierarchy paralleled that of a
company organization chart or a family tree.
Parent/child organization: An IMS database was excellent for
representing parent/child relationships, such as "A is a part of B" or
"A is owned by B."
Performance: IMS stored parent/child relationships as physical
pointers from one data record to another, so that movement
through the database was rapid. Because the structure was
simple, IMS could place parent and child records close to one
another on the disk, minimizing disk input/output.
IMS is still a very widely used DBMS on IBM mainframes. Its
raw performance makes it the database of choice in high-volume
transaction processing applications such as processing bank ATM
transactions, verifying credit card numbers, and tracking the
delivery of overnight packages. Although relational database
performance has improved dramatically over the last decade, the
performance requirements of applications such as these have also
increased, insuring a continued role for IMS.
1.7.2 Network Data Model:
The simple structure of a hierarchical database became a
disadvantage when the data had a more complex structure. In an
order-processing database, for example, a single order might
participate in three different parent/child relationships, linking the
order to the customer who placed it, the salesperson who took it,
and the product ordered. The structure of this type of data simply
didn't fit the strict hierarchy of IMS.
To deal with applications such as order processing, a new

network data model was developed. The network data model
extended the hierarchical model by allowing a record to participate
in multiple parent/child relationships.
For a programmer, accessing a network database was very
similar to accessing a hierarchical database. An application
program could:
find a specific parent record by key (such as a customer number),
move down to the first child in a particular set (the first order
placed by this customer),
move sideways from one child to the next in the set (the next
order placed by the same customer), or
move up from a child to its parent in another set (the salesperson
who took the order).
Once again the programmer had to navigate the database
record-by-record, this time specifying which relationship to navigate
as well as the direction.
Network databases had several advantages:
Flexibility: Multiple parent/child relationships allowed a network
database to represent data that did not have a simple hierarchical
structure.
Standardization: The CODASYL standard boosted the popularity
of the network model, and minicomputer vendors such as Digital
Equipment Corporation and Data General implemented network
databases.
Performance: Despite their greater complexity, network
databases boasted performance approaching that of hierarchical
databases. Sets were represented by pointers to physical data
records, and on some systems, the database administrator could
specify data clustering based on a set relationship.
Network databases had their disadvantages, too. Like
hierarchical databases, they were very rigid. The set relationships
and the structure of the records had to be specified in advance.
Changing the database structure typically required rebuilding the
entire database.
10
2
ENTITY RELATIONSHIP MODEL
Topics Covered:
2.1 Entity
2.2 Attributes
2.3 Keys
2.4 Relation
2.5 Cardinality
2.6 Participation
2.7 Weak Entities
2.8 ER Diagram
2.9 Conceptual Design With ER Model
2.1 ENTITY
The basic object that the ER model represents is an entity,

which is a "thing" in the real world with an independent
existence.
An entity may be an object with a physical existencea
particular person, car, house, or employeeor it may be an
object with a conceptual existencea company, a job, or a
university course.
2.2 ATTRIBUTES
Each entity has attributesthe particular properties that

describe it.
For example, an employee entity may be described by the

employees name, age, address, salary, and job.
A particular entity will have a value for each of its attributes.
The attribute values that describe each entity become a major

part of the data stored in the database.
Several types of attributes occur in the ER model: simple versus

composite; single-valued versus multi-valued; and stored versus
derived.
11
2.2.1 Composite versus Simple (Atomic) Attributes
Composite attributes can be divided into smaller subparts,

which represent more basic attributes with independent
meanings.
For example, the Address attribute of the employee entity can

be sub-divided into Street_Name, City, State, and Zip.
Attributes that are not divisible are called simple or atomic

attributes.
Composite attributes can form a hierarchy; for example, Name

can be subdivided into three simple attributes, First_Name,
Middle Name, Last_Name.
The value of a composite attribute is the concatenation of the

values of its constituent simple attributes.
Composite Attributes
2.2.2 Single-valued Versus Multi-valued Attributes
Attributes which have only one value for a entity are called
single valued attributes.
E.g. For a student entity, RollNo attribute has only one single
value.
But phone number attribute may have multiple values. Such

values are called Multi-valued attributes.
2.2.3 Stored Versus Derived Attributes
Two or more attribute values are relatedfor example, the Age

and Birth Date attributes of a person.
For a particular person entity, the value of Age can be

determined from the current (todays) date and the value of that
persons Birth Date.
The Age attribute is hence called a derived
The attribute from which another attribute value is derived is

called stored attribute.
In the above example, date of birth is the stored attribute.
12
Take another example, if we have to calculate the interest on

some principal amount for a given time, and for a particular rate
of interest, we can simply use the interest formule
o Interest=NPR/100;
In this case, interest is the derived attribute whereas principal

amount(P), time(N) and rate of interest(R) are all stored
attributes.
2.3 KEYS
An important constraint on the entities of an entity type is the

key or uniqueness constraint on attributes.
A key is an attribute (also known as column or field) or a

combination of attribute that is used to identify records.
Sometimes we might have to retrieve data from more than one

table, in those cases we require to join tables with the help of
keys.
The purpose of the key is to bind data together across tables

without repeating all of the data in every table
Such an attribute is called a key attribute, and its values can be

used to identify each entity uniquely.
For example, the Name attribute is a key of the COMPANY

entity type because no two companies are allowed to have the
same name.
For the PERSON entity type, a typical key attribute is

SocialSecurityNumber.
Sometimes, several attributes together form a key, meaning that

the combination of the attribute values must be distinct for each
entity.
If a set of attributes possesses this property, we can define a

composite attribute that becomes a key attribute of the entity
type.
The various types of key with e.g. in SQL are mentioned

below, (For examples let suppose we have an Employee Table with
attributes ID , Name ,Address , Department_ID ,Salary)
(I) Super Key An attribute or a combination of attribute that is
used to identify the records uniquely is known as Super Key. A
table can have many Super Keys.
13
E.g. of Super Key

1 ID
2 ID, Name
3 ID, Address
4 ID, Department_ID
5 ID, Salary
6 Name, Address
7 Name, Address, Department_ID So on as any
combination which can identify the records uniquely will be a Super
Key.
(II) Candidate Key It can be defined as minimal Super Key or
irreducible Super Key. In other words an attribute or a combination
of attribute that identifies the record uniquely but none of its proper
subsets can identify the records uniquely.
E.g. of Candidate Key
1 Code
2 Name, Address
For above table we have only two Candidate Keys (i.e.
Irreducible Super Key) used to identify the records from the table
uniquely. Code Key can identify the record uniquely and similarly
combination of Name and Address can identify the record uniquely,
but neither Name nor Address can be used to identify the records
uniquely as it might be possible that we have two employees with
similar name or two employees from the same house.
(III) Primary Key A Candidate Key that is used by the database
designer for unique identification of each row in a table is known as
Primary Key. A Primary Key can consist of one or more attributes of
a table.
E.g. of Primary Key - Database designer can use one of the
Candidate Key as a Primary Key. In this case we have Code and
Name, Address as Candidate Key, we will consider Code Key as
a Primary Key as the other key is the combination of more than one
attribute.
(IV) Foreign Key A foreign key is an attribute or combination of
attribute in one base table that points to the candidate key
(generally it is the primary key) of another table. The purpose of the
foreign key is to ensure referential integrity of the data i.e. only
values that are supposed to appear in the database are permitted.
14
E.g. of Foreign Key Let consider we have another table i.e.

Department
Table
with
Attributes
Department_ID,
Department_Name,
Manager_ID,
Location_ID
with
Department_ID as an Primary Key. Now the Department_ID
attribute of Employee Table (dependent or child table) can be
defined as the Foreign Key as it can reference to the
Department_ID attribute of the Departments table (the referenced
or parent table), a Foreign Key value must match an existing value
in the parent table or be NULL.
(V) Composite Key If we use multiple attributes to create a
Primary Key then that Primary Key is called Composite Key (also
called a Compound Key or Concatenated Key).
E.g. of Composite Key, if we have used Name, Address as a
Primary Key then it will be our Composite Key.
(VI) Alternate Key Alternate Key can be any of the Candidate
Keys except for the Primary Key.
E.g. of Alternate Key is Name, Address as it is the only other
Candidate Key which is not a Primary Key.
(VII) Secondary Key The attributes that are not even the Super
Key but can be still used for identification of records (not unique)
are known as Secondary Key.
E.g. of Secondary Key can be Name, Address, Salary,
Department_ID etc. as they can identify the records but they might
not be unique.
2.4 RELATION
There are several implicit relationships among the various entity

types.
In fact, whenever an attribute of one entity type refers to another

entity type, some relationship exists.
For example, the attribute Manager of department refers to an

employee who manages the department.
In the ER model, these references should not be represented as

relationships or relation. There is a relation borrower in the
entities customer and account which can be shown as follows:
15
Figure: E-R diagram corresponding to customers and loans.
2.5 CARDINALITY
Mapping cardinalities, or cardinality ratios, express the
number of entities to which another entity can be associated via a
relationship set.
For a relationship set R between entity sets A and B, the
mapping cardinality must be one of the following:
There are three types of relationships
1) One to one
2) One to many
3) Many to many
2.5.1 One to one:
An entity in A is associated with at most one entity in B, and
an entity in B is associated with at most one entity in A.
2.5.2 One to many:
An entity in A is associated with any number (zero or more)
of entities in B. An entity in B, however, can be associated with at
most one entity in A.
2.5.3 Many to one:
An entity in A is associated with at most one entity in B. An
entity in B, however, can be associated with any number (zero or
more) of entities in A.
2.5.4 Many to many:
An entity in A is associated with any number (zero or more)
of entities in B, and an entity in B is associated with any number
(zero or more) of entities in A.
16
Figure: Mapping cardinalities. (a) One to one. (b) One to many.
Figure; Mapping cardinalities. (a) Many to one. (b) Many to

many
2.6 PARTICIPATION
The participation of an entity set E in a relationship set R is said

to be total if every entity in E participates in at least one
relationship in R.
If only some entities in E participate in relationships in R, the

participation of entity set E in relationship R is said to be partial.
For example, we expect every loan entity to be related to at

least one customer through the borrower relationship.
17
Therefore the participation of loan in the relationship set

borrower is total.
In contrast, an individual can be a bank customer whether or not

she has a loan with the bank.
Hence, it is possible that only some of the customer entities are

related to the loan entity set through the borrower relationship,
and the participation of customer in the borrower relationship set
is therefore partial.
2.7 WEAK ENTITIES
An entity set may not have sufficient attributes to form a primary

key.
Such an entity set is termed a weak entity set.
An entity set that has a primary key is termed a strong entity set.
As an illustration, consider the entity set payment, which has the

three attributes: payment-number, payment-date, and paymentamount.
Payment numbers are typically sequential numbers, starting

from 1, generated separately for each loan.
Thus, al-though each payment entity is distinct, payments for

different loans may share the same payment number. Thus, this
entity set does not have a primary key; it is a weak entity set.
For a weak entity set to be meaningful, it must be associated

with another entity set, called the identifying or owner entity set.
Every weak entity must be associated with an identifying entity;

that is, the weak entity set is said to be existence dependent on
the identifying entity set.
The identifying entity set is said to own the weak entity set that it
identifies.
The relationship associating the weak entity set with the

identifying entity set is called the identifying relationship.
The identifying relationship is many to one from the weak entity

set to the identifying entity set, and the participation of the weak
entity set in the relationship is total.
In our example, the identifying entity set for payment is loan,

and a relationship loan-payment that associates payment
entities with their corresponding loan entities is the identifying
relationship.
Although a weak entity set does not have a primary key, we

nevertheless need a means of distinguishing among all those
18
entities in the weak entity set that depend on one particular
strong entity.
The discriminator of a weak entity set is a set of attributes that

allows this distinction to be made.
For example, the discriminator of the weak entity set payment is

the attribute payment-number, since, for each loan, a payment
number uniquely identifies one single payment for that loan.
The discriminator of a weak entity set is also called the partial

key of the entity set.
The primary key of a weak entity set is formed by the primary

key of the identifying entity set, plus the weak entity sets
discriminator.
In the case of the entity set payment, its primary key is {loannumber, payment-number}, where loan-number is the primary
key of the identifying entity set, namely loan, and paymentnumber distinguishes payment entities within the same loan.
The identifying relationship set should have no descriptive

attributes, since any required attributes can be associated with
the weak entity set
A weak entity set can participate in relationships other than the

identifying relationship.
For instance, the payment entity could participate in a

relationship with the account entity set, identifying the account
from which the payment was made.
A weak entity set may participate as owner in an identifying

relationship with another weak entity set.
It is also possible to have a weak entity set with more than one
identifying entity set.
A particular weak entity would then be identified by a

combination of entities, one from each identifying entity set.
The primary key of the weak entity set would consist of the
union of the primary keys of the identifying entity sets, plus the
discriminator of the weak entity set.
In E-R diagrams, a doubly outlined box indicates a weak entity

set, and a doubly outlined diamond indicates the corresponding
identifying relationship.
The weak entity set payment depends on the strong entity set
loan via the relationship set loan-payment.
The figure also illustrates the use of double lines to indicate total
participationthe participation of the (weak) entity set payment
in the relationship loan-payment is total, meaning that every
payment must be related via loan-payment to some loan.
19
Finally, the arrow from loan-payment to loan indicates that each

payment is for a single loan. The discriminator of a weak entity
set also is underlined, but with a dashed, rather than a solid,
line.
Figure: E-R diagram with a weak entity set.
2.8
ER DIAGRAM- SPECIALIZATION,
GENERALIZATION AND AG GREGATION
2.8.1 Specialization:
An entity set may include sub groupings of entities that are

distinct in some way from other entities in the set.
For instance, a subset of entities within an entity set may have

attributes that are not shared by all the entities in the entity set.
The E-R model provides a means for representing these
distinctive entity groupings.
Consider an entity set person, with attributes name, street, and

city. A person may be further classified as one of the following:
Customer
Employee
Each of these person types is described by a set of attributes

that includes all the attributes of entity set person plus possibly
additional attributes.
For example, customer entities may be described further by the

attribute customer-id, whereas employee entities may be
described further by the attributes employee-id and salary.
The process of designating sub groupings within an entity set is

called specialization.
The specialization of person allows us to distinguish among

persons according to whether they are employees or customers.
20
As another example, suppose the bank wishes to divide

accounts into two categories, checking account and savings
account. Savings accounts need a minimum balance, but the
bank may set interest rates differently for different customers,
offering better rates to favored customers.
Checking accounts have a fixed interest rate, but offer an

overdraft facility; the overdraft amount on a checking account
must be recorded.
The bank could then create two specializations of account,

namely savings-account and checking-account.
As we saw earlier, account entities are described by the

attributes account-number and balance.
The entity set savings-account would have all the attributes of

account and an additional attribute interest-rate.
The entity set checking-account would have all the attributes of

account, and an additional attribute overdraft-amount.
We can apply specialization repeatedly to refine a design

scheme. For instance, bank employees may be further
classified as one of the following:
Officer
Teller
Secretary
Each of these employee types is described by a set of attributes

that includes all the attributes of entity set employee plus
additional attributes. For example, officer entities may be
described further by the attribute office-number, teller entities by
the attributes station-number and hours-per-week, and
secretary entities by the attribute hours-per-week. Further,
secretary entities may participate in a relationship secretary-for,
which identifies which employees are assisted by a secretary.
An entity set may be specialized by more than one

distinguishing feature. In our example, the distinguishing feature
among employee entities is the job the employee performs.
Another, coexistent, specialization could be based on whether
the person is a temporary (limited-term) employee or a
permanent employee, resulting in the entity sets temporaryemployee and permanent-employee. When more than one
specialization is formed on an entity set, a particular entity may
belong to multiple specializations. For instance, a given
employee may be a temporary employee who is a secretary.
In terms of an E-R diagram, specialization is depicted by a

triangle component labeled ISA. The label ISA stands for is a
and represents, for example, that a customer is a person. The
ISA relationship may also be referred to as a super class-
21
subclass relationship. Higher- and lower-level entity sets are

depicted as regular entity setsthat is, as rectangles containing
the name of the entity set.
2.8.2 Generalization:
The refinement from an initial entity set into successive levels of

entity sub groupings represents a top-down design process in
which distinctions are made explicit. The design process may
also proceed in a bottom-up manner, in which multiple entity
sets are synthesized into a higher-level entity set on the basis of
common features. The database designer may have first
identified a customer entity set with the attributes name, street,
city, and customer-id, and an employee entity set with the
attributes name, street, city, employee-id, and salary. There are
similarities between the customer entity set and the employee
entity set in the sense that they have several attributes in
common. This commonality can be expressed by generalization,
which is a containment relationship that exists between a
higher-level entity set and one or more lower-level entity sets. In
our example, person is the higher-level entity set and customer
and employee are lower-level entity sets.
Higher- and lower-level entity sets also may be designated by

the terms super class and subclass, respectively. The person
entity set is the super class of the customer and employee
subclasses.
For all practical purposes, generalization is a simple inversion of

specialization. We will apply both processes, in combination, in
the course of designing the E-R schema for an enterprise. In
terms of the E-R diagram itself, we do not distinguish between
specialization and generalization. New levels of entity
representation will be distinguished (specialization) or
synthesized (generalization) as the design schema comes to
express fully the database application and the user
requirements of the database.
22
Figure 2.17 Specialization and generalization.

2.8.3 Aggregation:
One limitation of the E-R model is that it cannot express

relationships among relationships.
To illustrate the need for such a construct, consider the ternary

relationship works-on, which we saw earlier, between a
employee, branch,and job.
Now, suppose we want to record managers for tasks performed

by an employee at a branch; that is, we want to record
managers for (employee, branch, job)combinations. Let us
assume that there is an entity set manager.
One alternative for representing this relationship is to create a

quaternary relationship manages between employee, branch,
job, and manager. (A quaternary relationship is requireda
binary relationship between manager and employee would not
permit us to represent which (branch, job) combinations of an
employee are managed by which manager.)
Using the basic E-R modeling constructs, we obtain the E-R

diagram as follows:
23
Figure: E-R diagram with redundant relationships.
It appears that the relationship sets works-on and manages can

be combined into one single relationship set.
Nevertheless, we should not combine them into a single

relationship, since some employee, branch, job combinations
many not have a manager.
There is redundant information in the resultant figure, however,

since every employee, branch, job combination in manages is
also in works-on.
If the manager were a value rather than an manager entity, we

could instead make manager a multi valued attribute of the
relationship works-on.
But doing so makes it more difficult (logically as well as in

execution cost) to find, for example, employee-branch-job triples
for which a manager is responsible. Since the manager is a
manager entity, this alternative is ruled out in any case.
The best way to model a situation such as the one just

described is to use aggregation.
Aggregation is an abstraction through which relationships are

treated as higher-level entities.
Following figure shows a notation for aggregation commonly

used to represent the above situation.
24
Figure: E-R diagram with aggregation.
2.9 CONCEPTUAL DESIGN WITH E-R MODEL
An E-R diagram can express the overall logical structure of a

database graphically. E-R diagrams are simple and clear
qualities that may well account in large part for the widespread
use of the E-R model. Such a diagram consists of the following
major components:
Rectangles, which represent entity sets
Ellipses, which represent attributes
Diamonds, which represent relationship sets
Lines, which link attributes to entity sets and entity sets to
relationship sets
Double ellipses, which represent multi valued attributes
Dashed ellipses, which denote derived attributes
Double lines, which indicate total participation of an entity in
a relationship set
Double rectangles, which represent weak entity sets
Consider the entity-relationship diagram Figure below, which

consists of two entity sets, customer and loan, related through a
binary relationship set borrower. The attributes associated with
customer are customer-id, customer-name, customer-street,
and customer-city. The attributes associated with loan are loannumber and amount. In the Figure ,attributes of an entity set
that are members of the primary key are underlined. The
relationship set borrower may be many-to-many, one-to-many,
many-to-one, or one-to-one. To distinguish among these types,
25
we draw either a directed line () or an undirected line ()

between the relationship set and the entity set in question.
A directed line from the relationship set borrower to the entity

set loan specifies that borrower is either a one-to-one or
many-to-one relationship set, from customer to loan;
borrower cannot be a many-to-many or a one-to-many
relationship set from customer to loan.
An undirected line from the relationship set borrower to the

entity set loan specifies that borrower is either a many-tomany or one-to-many relationship set from customer to loan.
Figure: E-R diagram corresponding to customers and loans.
If a relationship set has also some attributes associated with it,

then we link these attributes to that relationship set. Following
figure shows how composite attributes can be represented in
the E-R notation.
Here, a composite attribute name, with component attributes

first-name, middle-initial, and last-name replaces the simple
attribute customer-name of customer. Also, a composite
attribute address, whose component attributes are street, city,
state, and zip-code replaces the attributes customer-street and
customer-city of customer. The attribute street is itself a
composite attribute whose component attributes are streetnumber, street-name, and apartment number.
Figure also illustrates a multi valued attribute phone-number,

depicted by a double ellipse, and a derived attribute age,
depicted by a dashed ellipse.
26
Figure: E-R diagram with composite, multi valued, and derived

attributes.
2.10 ENTITY v/s ATTRIBUTE
Should address be an attribute of Employees or an entity

(connected to Employees by a relationship)?
Depends upon the use we want to make of address information,

and the semantics of the data:
o
If we have several addresses per employee, address

must be an entity (since attributes cannot be setvalued).
If the structure (city, street, etc.) is important, e.g., we

want to retrieve employees in a given city, address
must be modelled as an entity (since attribute values
are atomic).
Works_In2 does not allow an employee to work in a

department for two or more periods.
Similar to the problem of wanting to record several

addresses for an employee: we want to record several values of
the descriptive attributes for each instance of this relationship.
27
An alternative is to create an entity set called Addresses and

to record associations between employees and addresses using a
relationship (say, Has_Address). This more complex alternative is
necessary in two situations:
We have to record more than one address for an employee.
We want to capture the structure of an address in our ER
diagram. For example, we might break down an address into city,
state, country, and Zip code, in addition to a string for street
information. By representing an address as an entity with these
attributes, we can support queries such as "Find all employees with
an address in Madison, WI."
For another example of when to model a concept as an
entity set rather than an attribute, consider the relationship set
shown in following diagram:
Intuitively, it records the interval during which an employee
works for a department. Now suppose that it is possible for an
employee to work in a given department over more than one
period.
This possibility is ruled out by the ER diagram's semantics,
because a relationship is uniquely identified by the participating
entities. The problem is that we want to record several values for
the descriptive attributes for each instance of the Works-ln2
28
relationship. (This situation is analogous to wanting to record

several addresses for each employee.) We can address this
problem by introducing an entity set called, say, Duration, with
attributes from and to, as shown in following Figure:
2.10 ENTITY v/s RELATIONSHIP
Suppose that each department manager is given a discretionary

budget (dbudget), as shown in following Figure, in which we
have also renamed the relationship set to Manages2.
Figure: Entity versus Relationship
Given a department, we know the manager, as well as the

manager's starting date and budge for that department.
This approach is natural if we assume that a manager receives

a separate discretionary budget for each department that he or
she manages.
But what if the discretionary budget is a sum that covers all

departments managed by that employee?
In this case, each Manages2 relationship that involves a given

employee will have the same value in the db1Ldget field,
leading to redundant storage of the same information. Another
problem with this design is that it is misleading; it suggests that
the budget is associated with the relationship, when it is actually
associated with the manager.
We can address these problems by introducing a new entity set

called Managers (which can be placed below Employees in an
ISA hierarchy, to show that every manager is also an
employee).
The attributes since and dbudget now describe a manager

entity, as intended. As a variation, while every manager has a
budget, each manager may have a different starting date (as
manager) for each department. In this case dbudget is an
attribute of Managers, but since is an attribute of the relationship
set between managers and departments.
The imprecise nature of ER modeling can thus make it difficult

to recognize underlying entities, and we might associate
29
attributes with relationships rather than the appropriate entities.
In general, such mistakes lead to redundant storage of the
same information and can cause many problems.
2.11 BINARY v/s TERNARY RELATIONSHIP
Consider the ER diagram shown in following Figure. It models a

situation in which an employee can own several policies, each
policy can be owned by several employees, and each
dependent can be covered by several policies. Suppose that we
have the following additional requirements:
A policy cannot be owned jointly by two or more employees.
Every policy must be owned by some employee.
Dependents is a weak entity set, and each dependent entity
is uniquely identified by taking pname in conjunction with the
policyid of a policy entity (which, intuitively, covers the given
dependent).
Figure: Policies as an Entity Set
The first requirement suggests that we impose a key

constraint on Policies with respect to Covers, but this constraint has
the unintended side effect that a policy can cover only one
dependent. The second requirement suggests that we impose a
total participation constraint on Policies. This solution is acceptable
if each policy covers at least one dependent. The third requirement
forces us to introduce an identifying relationship that is binary (in
our version of ER diagrams, although there are versions in which
this is not the case).
o Even ignoring the third requirement, the best way to model
this situation is to use two binary relationships, as shown in
following Figure:
30
Figure: Policy Revisited
This example really has two relationships involving Policies, and

our attempt to use a single ternary relationship is inappropriate.
There are situations, however, where a relationship inherently
a.'3sociates more than two entities.
As a typical example of a ternary relationship, consider entity

sets Parts, Sup- pliers, and Departments, and a relationship set
Contracts (with descriptive attribute qty) that involves all of
them. A contract specifies that a supplier will supply (some
quantity of) a part to a department. This relationship cannot be
adequately captured by a collection of binary relationships
(without the use of aggregation). With binary relationships, we
can denote that a supplier 'can supply' certain parts, that a
department 'needs' some parts, or that a department 'deals with'
a certain supplier. No combination of these relationships
expresses the meaning of a contract adequately, for at least two
reasons:
The facts that supplier S can supply part P, that department D

needs part P, and that D will buy from S do not necessarily imply
that department D indeed buys part P from supplier S.
We cannot represent the qty attribute of a contract cleanly.
2.12 AGGREGATE v/s TERNARY RELATIONSHIP
The choice between using aggregation or a ternary relationship

is mainly determined by the existence of a relationship that
relates a relationship set to an entity set (or second relationship
set). The choice may also be guided by certain integrity
constraints that we want to express. For example, a project can
be sponsored by any number of departments, a department can
sponsor one or more projects, and each sponsorship is
31
monitored by one or more employees. If we don't need to record

the until attribute of Monitors, then we might reasonably use a
ternary relationship, say, Sponsors2, as shown in following
Figure.
Consider the constraint that each sponsorship (of a project by a

department) be monitored by at most one employee. We cannot
express this constraint in terms of the Sponsors2 relationship
set. On the other hand, we can easily express the constraint by
drawing an arrow from the aggregated relationship Sponsors to
the relationship Monitors. Thus, the presence of such a
constraint serves &s another reason for using aggregation
rather than a ternary relationship set.
Figure: Using a Ternary Relationship instead of Aggregation

Summary:
Conceptual design follows requirements analysis,

o Yields a high-level description of data to be stored
ER model popular for conceptual design

o Constructs are expressive, close to the way people
think about their applications.
Basic constructs: entities, relationships, and attributes(of

entities and relationships).
Some additional constructs: weak entities, ISA hierarchies,

And aggregation.
Several kinds of integrity constraints can be expressed in the

ER
model: key
constraints, participation constraints,
and overlap/covering
constraints for
ISA
hierarchies.
32
Some foreign key constraints are also implicit in the

definition of a relationship set.
Some constraints (notably, functional dependencies) cannot

be expressed in the ER model.
Constraints play an important role in determining the best

database design for an enterprise.
ER design is subjective. There are often many ways to

model a given scenario! Analyzing alternatives can be tricky,
especially for a large enterprise. Common choices include:
o
Entity vs. attribute, entity vs. relationship, binary or nary relationship, whether or not to use ISA
hierarchies, and whether or not to use aggregation.
To ensure good database design, resulting relational

schema should be analyzed and refined further. FD
information and normalization techniques are especially
useful.
33
3
RELATIONAL MODEL
Topics covered
3.1 Introduction to Relational Model
3.2 Creating and modifying Relations using SQL
3.3 Integrity constraints over the Relation
3.4 Logical Database Design: ER to Relational
3.5 Relational Algebra
3.1 INTRODUCTION TO RELATIONAL MODEL:

The relational model represents the database as a collection
of relations. Informally, each relation resembles a table of values or,
to some extent, a "flat" file of records. When a relation is thought of
as a table of values, each row in the table represents a collection of
related data values. In the relational model, each row in the table
represents a fact that typically corresponds to a real world entity or
relationship. The table name and column names are used to help in
interpreting the meaning of the values in each row. In the formal
relational model terminology, a row is called a tuple, a column
header is called an attribute, and the table is called a relation. The
data type describing the types of values that can appear in each
column is called a domain. We now define these termsdomain,
tuple, attribute, and relationmore precisely.
Figure: The account relation.
34
3.2 CREATING AND MODIFYING RELATIONS USING

SQL
3.2.1 Creating Relations: (CREATE TABLE STATEMENT)
The CREATE TABLE statement, defines a new
table(Relation) in the database and prepares it to accept data. The
various clauses of the statement specify the elements of the table
definition.
Figure: Basic CREATE TABLE syntax diagram
35
SQL CREATE TABLE statement defines a new table to store the

products data:
CREATE TABLE PRODUCTS
(MFR_ID CHAR(3),
PRODUCT_ID CHAR(5),
DESCRIPTION VARCHAR(20),
PRICE MONEY,
QTY_ON_HAND INTEGER)
Table created
Although more cryptic than the previous SQL statements,
the CREATE TABLE statement is still fairly straightforward. It
assigns the name PRODUCTS to the new table and specifies the
name and type of data stored in each of its five columns.
Once the table has been created, you can fill it with data.
3.2.1 Modifying Relations: (ALTER TABLE STATEMENT)
After a table has been in use for some time, users often
discover that they want to store additional information about the
entities
represented
in
the
table.
Figure : ALTER TABLE statement syntax diagram

The ALTER TABLE statement can:
Add a column definition to a table
Drop a column from a table
Change the default value for a column
Add or drop a primary key for a table
Add or drop a new foreign key for a table
Add or drop a uniqueness constraint for a table
Add or drop a check constraint for a table.
36
For example: Add a minimum inventory level column to the

PRODUCTS table.
ALTER TABLE PRODUCTS
ADD MIN_QTY INTEGER NOT NULL WITH DEFAULT 0
In the first example, the new columns will have NULL values
for existing customers. In the second example, the MIN_QTY
column will have the value zero (0) for existing products, which is
appropriate.
3.3 INTEGRITY CONSTRAINTS OVER THE RELATION:

To preserve the consistency and correctness of its stored
data, a relational DBMS typically imposes one or more data
integrity constraints. These constraints restrict the data values that
can be inserted into the database or created by a database update.
Several different types of data integrity constraints are commonly
found in relational databases, including:
Required data: Some columns in a database must contain a
valid data value in every row; they are not allowed to contain
missing or NULL values. In the sample database, every order must
have an associated customer who placed the order. The DBMS can
be asked to prevent NULL values in this column.
Validity checking: Every column in a database has a domain, a
set of data values that are legal for that column. The DBMS can be
asked to prevent other data values in these columns.
Entity integrity: The primary key of a table must contain a unique
value in each row, which is different from the values in all other
rows. Duplicate values are illegal, because they wouldn't allow the
database to distinguish one entity from another. The DBMS can be
asked to enforce this unique values constraint.
Referential integrity: A foreign key in a relational database links
each row in the child table containing the foreign key to the row of
the parent table containing the matching primary key value. The
DBMS can be asked to enforce this foreign key/primary key
constraint.
Other data relationships: The real-world situation modeled by a
database will often have additional constraints that govern the
legal data values that may appear in the database.
The DBMS can be asked to check modifications to the tables to
make sure that their values are constrained in this way.
37
Business rules: Updates to a database may be constrained by

business rules governing the real-world transactions that are
represented by the updates.
Consistency: Many real-world transactions cause multiple
updates to a database. The DBMS can be asked to enforce this
type of consistency rule or to support applications that implement
such rules.
3.4
LOGICAL DATABASE
RELATIONAL
DESIGN:
ER
TO
The ER model is convenient for representing an initial, highlevel database design. Given an ER diagram describing a
databa'3e, a standard approach is taken to generating a relational
database schema that closely approximates the ER design. (The
translation is approximate to the extent that we cannot capture all
the constraints implicit in the ER design using SQL, unless we use
certain SQL constraints that are costly to check.) We now describe
how to translate an ER diagram into a collection of tables with
associated constraints, that is, a relational database schema.
3.4.1 Entity Sets to Tables
An entity set is mapped to a relation in a straightforward
way: Each attribute of the entity set becomes an attribute of the
table. Note that we know both the domain of each attribute and the
(primary) key of an entity set. Consider the Employees entity set
with attributes ssn, name, and lot shown in following Figure.
Figure: The Employees Entity Set

A possible instance of the Employees entity set, containing
three Employees entities, is shown in following Figure in a tabular
format.
Figure: An Instance of the Employees Entity Set
38
3.5 RELATIONAL ALGEBRA:

The relational algebra is a procedural query language. It
consists of a set of operations that take one or two relations as
input and produce a new relation as their result. The fundamental
operations in the relational algebra are select, project, union, set
difference, Cartesian product, and rename. In addition to the
fundamental operations, there are several other operations
namely, set intersection, natural join, division, and assignment. We
will define these operations in terms of the fundamental operations.
3.5.1 Fundamental Operations
The select, project, and rename operations are called unary
operations, because they operate on one relation. The other three
operations operate on pairs of relations and are, therefore, called
binary operations.
3.5.1.1 The Select Operation
The select operation selects tuples that satisfy a given
predicate. We use the lowercase Greek letter sigma () to denote
selection. The predicate appears as a subscript to .
The argument relation is in parentheses after the . Thus, to select
those tuples of the loan relation where the branch is Perryridge,
we write
branch-name =Perryridge (loan)
We can find all tuples in which the amount lent is more than
$1200 by writing amount>1200 (loan)
In general, we allow comparisons using =,
=, <, , >, in the selection predicate.
Furthermore, we can combine several predicates into a larger
predicate by using the connectives and (), or (), and not ().
Thus, to find those tuples pertaining to loans of more than $1200
made by the Perryridge branch, wewrite:
branch-name =Perryridge amount>1200 (loan)
Figure: Result of branch-name =Perryridge (loan).
39
The selection predicate may include comparisons between

two attributes. To illustrate, consider the relation loan-officer that
consists of three attributes: customer-name, banker-name, and
loan-number, which specifies that a particular banker is the loan
officer for a loan that belongs to some customer. To find all
customers who have the same name as their loan officer, we can
write customer-name =banker-name (loan-officer).
Relational algebra, an offshoot of first-order logic (and of
algebra of sets), deals with a set of finitary relations which is closed
under certain operators. These operators operate on one or more
relations to yield a relation.
As in any algebra, some operators are primitive and the
others, being definable in terms of the primitive ones, are derived. It
is useful if the choice of primitive operators parallels the usual
choice of primitive logical operators. Although it is well known that
the usual choice in logic of AND, OR and NOT is somewhat
arbitrary, Codd made a similar arbitrary choice for his algebra.
The six primitive operators of Codd's algebra are the
selection, the projection, the Cartesian product (also called the
cross product or cross join), the set union, the set difference, and
the rename. (Actually, Codd omitted the rename, but the compelling
case for its inclusion was shown by the inventors of ISBL.) These
six operators are fundamental in the sense that none of them can
be omitted without losing expressive power. Many other operators
have been defined in terms of these six. Among the most important
are set intersection, division, and the natural join. In fact ISBL made
a compelling case for replacing the Cartesian product with the
natural join, of which the Cartesian product is a degenerate case.
Altogether, the operators of relational algebra have identical
expressive power to that of domain relational calculus or tuple
relational calculus. However, for the reasons given in the
Introduction above, relational algebra has strictly less expressive
power than that of first-order predicate calculus without function
symbols. Relational algebra actually corresponds to a subset of
first-order logic that is Horn clauses without recursion and negation.
Set operators
Although three of the six basic operators are taken from set
theory, there are additional constraints that are present in their
relational algebra counterparts: For set union and set difference,
the two relations involved must be union-compatiblethat is, the
two relations must have the same set of attributes. As set
intersection can be defined in terms of set difference, the two
relations involved in set intersection must also be union-compatible.
40
The Cartesian product is defined differently from the one defined in

set theory in the sense that tuples are considered to be 'shallow' for
the purposes of the operation. That is, unlike in set theory, where
the Cartesian product of a n-tuple by an m-tuple is a set of 2-tuples,
the Cartesian product in relational algebra has the 2-tuple
"flattened" into an n+m-tuple. More formally, R S is defined as
follows:
R S = {r s | r R, s S}
In addition, for the Cartesian product to be defined, the two
relations involved must have disjoint headers that is, they must
not have a common attribute name.
Projection ()
A projection
is
unary
operation
written
as
where a1,...,an is a set of attribute names. The result of

such projection is defined as the set that is obtained when all tuples
in R are restricted to the set {a1,...,an}.
Selection ()
A generalized selection is a unary operation written as
where is a propositional formula that consists of atoms as
allowed in the normal selection and the logical operators (and),
(or) and (negation). This selection selects all those tuples in R
for which holds.
Rename ()
A rename is a unary operation written as a / b(R) where the
result is identical to R except that the b field in all tuples is renamed
to an a field. This is simply used to rename the attribute of a
relation or the relation itself.
41
4
SQL
Topics covered
4.1 Data Definition Commands
4.2 Constraints
4.3 View
4.4 Data Manipulation Commands
4.5 Queries
4.6 Aggregate Queries
4.7 NULL values
4.8 Outer Joins
4.9 Nested Queries- Correlated Queries
4.10 Embedded SQL
4.11 Dynamic SQL
4.12 TRIGGERS
4.1 DATA DEFINITION COMMANDS:

We specify a database schema by a set of definitions
expressed by a special language called a data-definition language
(DDL).
4.1.1 Create Table Statement
For instance, the following statement in the SQL language
defines the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account
table. In addition, it updates a special set of tables called the data
dictionary or data directory.
A data dictionary contains metadatathat is, data about
data. The schema of a table is an example of metadata. A
database system consults the data dictionary before reading or
modifying actual data.
42
We specify the storage structure and access methods used

by the database system by a set of statements in a special type of
DDL called a data storage and definition language. These
statements define the implementation details of the database
schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain
consistency constraints. For example, suppose the balance on an
account should not fall below $100. The DDL provides facilities to
specify such constraints. The database systems check these
constraints every time the database is updated.
4.1.2 DROP table statement:
Over time the structure of a database grows and changes.
New tables are created to represent new entities, and some old
tables are no longer needed. You can remove an unneeded table
from the database with the DROP TABLE statement
Figure 13-3: DROP TABLE statement syntax diagram

The table name in the statement identifies the table to be
dropped. Normally you will be dropping one of your own tables and
will use an unqualified table name. With proper permission, you can
also drop a table owned by another user by specifying a qualified
table name.
For example:
DROP TABLE CUSTOMER
4.1.3 ALTER Table statement
Refer to the section: 3.2.1 Modifying Relations: (ALTER TABLE
STATEMENT)
4.2 CONSTRAINTS
A SQL2 check constraint is a search condition, like the
search condition in a WHERE clause, that produces a true/false
value. When a check constraint is specified for a column, the
DBMS automatically checks the value of that column each time a
new row is inserted or a row is updated to insure that the search
condition is true. If not, the INSERT or UPDATE statement fails. A
column check constraint is specified as part of the column definition
within the CREATE TABLE statement.
43
Consider this excerpt from a CREATE TABLE statement, includes

three check constraints:
CREATE TABLE SALESREPS
(EMPL_NUM INTEGER NOT NULL
CHECK (EMPL_NUM BETWEEN 101 AND 199),
AGE INTEGER
CHECK (AGE >= 21),
QUOTA MONEY
CHECK (MONEY >= 0.0) )
The first constraint (on the EMPL_NUM column) requires
that valid employee numbers be three-digit numbers between 101
and 199. The second constraint (on the AGE column) similarly
prevents hiring of minors. The third constraint (on the QUOTA
column) prevents a salesperson from having a quota target less
than $0.00.
All three of these column check constraints are very simple
examples of the capability specified by the SQL2 standard. In
general, the parentheses following the keyword CHECK can
contain any valid search condition that makes sense in the context
of a column definition. With this flexibility, a check constraint can
compare values from two different columns of the table, or even
compare a proposed data value against other values from the
database.
4.3 VIEW
A view is a "virtual table" in the database whose contents are
defined by a query.
The tables of a database define the structure and
organization of its data. However, SQL also lets you look at the
stored data in other ways by defining alternative views of the data.
A view is a SQL query that is permanently stored in the database
and assigned a name. The results of the stored query are "visible"
through the view, and SQL lets you access these query results as if
they were, in fact, a "real" table in the database.
Views are an important part of SQL, for several reasons:
Views let you tailor the appearance of a database so that different
users see it from different perspectives.
Views let you restrict access to data, allowing different users to
see only certain rows or certain columns of a table.
Views simplify database access by presenting the structure of the
stored data in the way that is most natural for each user.
44
4.3.1 Advantages of VIEW

Views provide a variety of benefits and can be useful in
many different types of databases. In a personal computer
database, views are usually a convenience, defined to simplify
database requests. In a production database installation, views play
a central role in defining the structure of the database for its users
and enforcing its security. Views provide these major benefits:
Security: Each user can be given permission to access the
database only through a small set of views that contain the specific
data the user is authorized to see, thus restricting the user's access
to stored data.
Query simplicity: A view can draw data from several different
tables and present it as a single table, turning multi-table queries
into single-table queries against the view.
Structural simplicity: Views can give a user a "personalized"
view of the database structure, presenting the database as a set of
virtual tables that make sense for that user.
Insulation from change: A view can present a consistent,
unchanged image of the structure of the database, even if the
underlying source tables are split, restructured, or renamed.
Data integrity: If data is accessed and entered through a view,
the DBMS can automatically check the data to ensure that it meets
specified integrity constraints.
4.3.2 Disadvantages of VIEW
While views provide substantial advantages, there are also
two major disadvantages to using a view instead of a real table:
Performance: Views create the appearance of a table, but the
DBMS must still translate queries against the view into queries
against the underlying source tables. If the view is defined by a
complex, multi-table query, then even a simple query against the
view becomes a complicated join, and it may take a long time to
complete.
Update restrictions: When a user tries to update rows of a view,
the DBMS must translate the request into an update on rows of the
underlying source tables. This is possible for simple views, but
more complex views cannot be updated; they are "read-only."
These disadvantages mean that you cannot indiscriminately
define views and use them instead of the source tables. Instead,
you must in each case consider the advantages provided by using
a view and weigh them against the disadvantages.
45
4.3.3 Creating a VIEW

The CREATE VIEW statement is used to create a view. The
statement assigns a name to the view and specifies the query that
defines the view. To create the view successfully, you must have
permission to access all of the tables referenced in the query.
The CREATE VIEW statement can optionally assign a name
to each column in the newly created view. If a list of column names
is specified, it must have the same number of items as the number
of columns produced by the query. Note that only the column
names are specified; the data type, length, and other
characteristics of each column are derived from the definition of the
columns in the source tables. If the list of column names is omitted
from the CREATE VIEW statement, each column in the view takes
the name of the corresponding column in the query. The list of
column names must be specified if the query includes calculated
columns or if it produces two columns with identical names.
For example:
Define a view containing only Eastern region offices.
CREATE VIEW EASTOFFICES AS
SELECT *
FROM OFFICES
WHERE REGION = 'Eastern'
4.4 DATA MANIPULATION COMMANDS

DML Commands are used for manipulating data in
database.
4.4.1 Insert Statement
The INSERT statement, adds a new row to a table. The
INTO clause specifies the table that receives the new row (the
target table), and the VALUES clause specifies the data values that
the new row will contain. The column list indicates which data value
goes into which column of the new row.
For example:
INSERT INTO SALESREPS(NAME, AGE, EMPL_NUM, SALES,
TITLE,
HIRE_DATE, REP_OFFICE)
VALUES ('Henry Jacobsen', 36, 111, 0.00, 'Sales Mgr',
'25-JUL-90', 13)
1 row inserted.
The INSERT statement builds a single row of data that
matches the column structure of the table, fills it with the data from
the VALUES clause, and then adds the new row to the table. The
rows of a table are unordered, so there is no notion of inserting the
46
row "at the top" or "at the bottom" or "between two rows" of the
table. After the INSERT statement, the new row is simply a part of
the table. A subsequent query against the SALESREPS table will
include the new row, but it may appear anywhere among the rows
of query results.
4.4.2 Delete Statement
The DELETE statement removes selected rows of data from
a single table. The FROM clause specifies the target table
containing the rows. The WHERE clause specifies which rows of
the table are to be deleted.
For example:
Remove Henry Jacobsen from the database.
DELETE FROM SALESREPS
WHERE NAME = 'Henry Jacobsen'
1 row deleted.
The WHERE clause in this example identifies a single row of
the SALESREPS table, which SQL removes from the table.
We can delete all the rows from a table.
For example:
DELETE FROM ORDERS
30 rows deleted.
4.4.3 Update Statement
The UPDATE statement modifies the values of one or more
columns in selected rows of a single table. The target table to be
updated is named in the statement, and you must have the required
permission to update the table as well as each of the individual
columns that will be modified. The WHERE clause selects the rows
of the table to be modified. The SET clause specifies which
columns are to be updated and calculates the new values for them.
For example:
Here is a simple UPDATE statement that changes the credit
limit and salesperson for a customer:
Raise the credit limit for Acme Manufacturing to $60,000 and
reassign them to Mary Jones (employee number 109).
UPDATE CUSTOMERS
SET CREDIT_LIMIT = 60000.00, CUST_REP = 109
WHERE COMPANY = 'Acme Mfg.'
1 row updated.
47
In this example, the WHERE clause identifies a single row of

the CUSTOMERS table, and the SET clause assigns new values to
two of the columns in that row.
4.5 QUERIES
Select-From-Where Statements
The SELECT statement retrieves data from a database and returns
it to you in the form of query results.
The SELECT clause lists the data items to be retrieved by
the SELECT statement. The
items may be columns from the
database, or columns to be calculated by SQL as it performs the
query.
The FROM clause lists the tables that contain the data to be
retrieved by the query.
The WHERE clause tells SQL to include only certain rows of data
in the query results. A search condition is used to specify the
desired rows.
For Example:
SELECT NAME, HIRE_DATE
FROM SALESREPS
WHERE SALES > 500000.00
4.6 AGGREGATE QUERIES

Aggregate functions are functions that take a collection (a
set or multi set) of values as input and return a single value. SQL
offers five built-in aggregate functions:
Average: avg
Minimum: min
Maximum: max
Total: sum
Count: count
Consider the query Find the average account balance at the
Perryridge branch.We write this query as follows:
select avg (balance)
from account
where branch-name = Perryridge
The result of this query is a relation with a single attribute,
containing a single tuple with a numerical value corresponding to
the average balance at the Perryridge branch.
48
Consider the query Find the minimum salary offered to a

employee.We write this query as follows:
select min (salary)
From employee
the minimum salary offered to an employee.
Consider the query Find the maximum salary offered to a
employee.We write this query as follows:
select max (salary)
From employee
the maximum salary offered to an employee
To find the number of tuples in the customer relation, we write
select count (*)
from customer.
the total number of customers present in the customer table.
To find the total salary issued to the employees we write the
query:
select sum (salary)
from employee
the addition of the salaries offered to all the employees.
4.7 NULL VALUES:

Because a database is usually a model of a real-world
situation, certain pieces of data are inevitably missing, unknown, or
don't apply. In the sample database, for example, the QUOTA
column in the SALESREPS table contains the sales goal for each
salesperson. However, the newest salesperson has not yet been
assigned a quota; this data is missing for that row of the table. You
might be tempted to put a zero in the column for this salesperson,
but that would not be an accurate reflection of the situation. The
salesperson does not have a zero quota; the quota is just "not yet
known."
49
Similarly, the MANAGER column in the SALESREPS table

contains the employee number of each salesperson's manager. But
Sam Clark, the Vice President of Sales, has no manager in the
sales organization. This column does not apply to Sam. Again, you
might think about entering a zero, or a 9999 in the column, but
neither of these values would really be the employee number of
Sam's boss. No data value is applicable to this row.
SQL supports missing, unknown, or inapplicable data
explicitly, through the concept of a null value. A null value is an
indicator that tells SQL (and the user) that the data is missing or not
applicable. As a convenience, a missing piece of data is often said
to have the value NULL. But the NULL value is not a real data
value like 0, 473.83, or "Sam Clark." Instead, it's a signal, or a
reminder, that the data value is missing or unknown.
In many situations NULL values require special handling by
the DBMS. For example, if the user requests the sum of the
QUOTA column, how should the DBMS handle the missing data
when computing the sum? The answer is given by a set of special
rules that govern NULL value handling in various SQL statements
and clauses. Because of these rules, some leading database
authorities feel strongly that NULL values should not be used.
4.8 OUTER JOINS

The process of forming pairs of rows by matching the
contents of related columns is called joining the tables. The
resulting table (containing data from both of the original tables) is
called a join between the two tables.
The SQL join operation combines information from two
tables by forming pairs of related rows from the two tables. The row
pairs that make up the joined table are those where the matching
columns in each of the two tables have the same value. If one of
the rows of a table is unmatched in this process, the join can
produce unexpected results, as illustrated by these queries:
List the salespeople and the offices where they work.
SELECT NAME, REP_OFFICE
FROM SALESREPS
NAME
-------------Bill Adams
Mary Jones
Sue Smith
Sam Clark
Bob Smith
REP_OFFICE
---------13
11
21
11
12
50
Dan Roberts
Tom Snyder
Larry Fitch
Paul Cruz
Nancy Angelli
12
NULL
21
12
22
List the salespeople and the cities where they work.

SELECT NAME, CITY
FROM SALESREPS, OFFICES
WHERE REP_OFFICE = OFFICE
NAME
CITY
------------- -------Mary Jones New York
Sam Clark
New York
Bob Smith
Chicago
Paul Cruz
Chicago
Dan Roberts Chicago
Bill Adams Atlanta
Sue Smith
Los Angeles
Larry Fitch Los Angeles
Nancy Angelli Denver
The outer join query that will combine the results of above queries
and join the 2 tables is as follows:
SELECT NAME, CITY
WHERE REP_OFFICE *= OFFICE
NAME
CITY
-------------------Tom Snyder NULL
Mary Jones New York
Sam Clark
New York
Bob Smith
Chicago
Paul Cruz
Chicago
Dan Roberts Chicago
Bill Adams Atlanta
Sue Smith
Los Angeles
51
4.8.1 Left and Right outer join

Technically, the outer join produced by the previous query is
called the full outer join of the two tables. Both tables are treated
symmetrically in the full outer join. Two other well-defined outer
joins do not treat the two tables symmetrically.
The left outer join between two tables is produced by
following Step 1 and Step 2 in the previous numbered list but
omitting Step 3. The left outer join thus includes NULL-extended
copies of the unmatched rows from the first (left) table but does not
include any unmatched rows from the second (right) table. Here is
a left outer join between the GIRLS and BOYS tables: List girls
and boys in the same city and any unmatched girls.
SELECT *
FROM GIRLS, BOYS
WHERE GIRLS.CITY *= BOYS.CITY
GIRLS.NAME GIRLS.CITY BOYS.NAME BOYS.CITY
---------- ---------- --------- --------Mary
Boston
John
Boston
Mary
Boston
Henry
Boston
Susan
Chicago Sam
Chicago
Betty
Chicago Sam
Chicago
Anne
Denver
NULL
NULL
Nancy
NULL
NULL
NULL
The query produces six rows of query results, showing the
matched girl/boy pairs and the unmatched girls. The unmatched
boys are missing from the results.
Similarly, the right outer join between two tables is produced
by following Step 1 and Step 3 in the previous numbered list but
omitting Step 2. The right outer join thus includes NULL-extended
copies of the unmatched rows from the second (right) table but
does not include the unmatched rows of the first (left) table. Here is
a right outer join between the GIRLS and BOYS tables:
List girls and boys in the same city and any unmatched boys.
SELECT *
FROM GIRLS, BOYS
WHERE GIRLS.CITY =* BOYS.CITY
GIRLS.NAME GIRLS.CITY BOYS.NAME BOYS.CITY
---------- ---------- --------- --------Mary
Boston
John
Boston
Mary
Boston
Henry
Boston
Susan
Chicago Sam
Chicago
52
Betty
NULL
NULL
Chicago
NULL
NULL
Sam
Chicago
James
Dallas
George NULL
This query also produces six rows of query results, showing

the matched girl/boy pairs and the unmatched boys. This time the
unmatched girls are missing from the results.
As noted before, the left and right outer joins do not treat the
two joined tables symmetrically. It is often useful to think about one
of the tables being the "major" table (the one whose rows are all
represented in the query results) and the other table being the
"minor" table (the one whose columns contain NULL values in the
joined query results). In a left outer join, the left (first-mentioned)
table is the major table, and the right (later-named) table is the
minor table. The roles are reversed in a right outer join (right table
is major, left table is minor).
In practice, the left and right outer joins are more useful than
the full outer join, especially when joining data from two tables
using a parent/child (primary key/foreign key) relationship. To
illustrate, consider once again the sample database. We have
already seen one example involving the SALESREPS and
OFFICES table. The REP_OFFICE column in the SALESREPS
table is a foreign key to the OFFICES table; it tells the office where
each salesperson works, and it is allowed to have a NULL value for
a new salesperson who has not yet been assigned to an office.
Tom Snyder is such a salesperson in the sample database. Any
join that exercises this SALESREPS-to-OFFICES relationship and
expects to include data for Tom Snyder must be an outer join, with
the SALESREPS table as the major table. Here is the example
used earlier:
SELECT NAME, CITY
WHERE REP_OFFICE *= OFFICE
NAME
CITY
------------- -------Tom Snyder NULL
Mary Jones New York
Sam Clark
New York
Bob Smith
Chicago
Paul Cruz
Chicago
Dan Roberts Chicago
Bill Adams Atlanta
53
Sue Smith
Los Angeles
Note in this case (a left outer join), the "child" table
(SALESREPS, the table with the foreign key) is the major table in
the outer join, and the "parent" table (OFFICES) is the minor table.
The objective is to retain rows containing NULL foreign key values
(like Tom Snyder's) from the child table in the query results, so the
child table becomes the major table in the outer join. It doesn't
matter whether the query is actually expressed as a left outer join
(as it was previously) or as a right outer join like this:
SELECT NAME, CITY
WHERE OFFICE =* REP_OFFICE
NAME
CITY
------------- --------Tom Snyder NULL
Mary Jones New York
Sam Clark
New York
Bob Smith
Chicago
Paul Cruz
Chicago
Dan Roberts Chicago
Bill Adams Atlanta
Sue Smith
Los Angeles
What matters is that the child table is the major table in the outer
join.
There are also useful joined queries where the parent is the
major table and the child table is the minor table. For example,
suppose the company in the sample database opens a new sales
office in Dallas, but initially the office has no salespeople assigned
to it. If you want to generate a report listing all of the offices and the
names of the salespeople who work there, you might want to
include a row representing the Dallas office. Here is the outer join
query that produces those results:
List the offices and the salespeople who work in each one.
SELECT CITY, NAME
FROM OFFICES, SALESREPS
WHERE OFFICE *= REP_OFFICE
54
CITY
----------New York
New York
Chicago
Chicago
Chicago
Atlanta
Los Angeles
Los Angeles
Denver
Dallas
NAME
---------Mary Jones
Sam Clark
Bob Smith
Paul Cruz
Dan Roberts
Bill Adams
Sue Smith
Larry Fitch
Nancy Angelli
NULL
In this case, the parent table (OFFICES) is the major table in

the outer join, and the child table (SALESREPS) is the minor table.
The objective is to insure that all rows from the OFFICES table are
represented in the query results, so it plays the role of major table.
The roles of the two tables are precisely reversed from the previous
example. Of course, the row for Tom Snyder, which was included in
the query results for the earlier example (when SALESREPS was
the major table), is missing from this set of query results because
SALESREPS is now the minor table.
4.9 NESTED QUERIES- CORRELATED QUERIES

4.9.1 Nested Queries (Sub Queries)
A subquery or a nested query is a query-within-a-query. The
results of the subquery are used by the DBMS to determine the
results of the higher-level query that contains the subquery. In the
simplest forms of a subquery, the subquery appears within the
WHERE or HAVING clause of another SQL statement. Sub queries
provide an efficient, natural way to handle query requests that are
themselves expressed in terms of the results of other queries. Here
is an example of such a request:
List the offices where the sales target for the office exceeds
the sum of the individual salespeople's quotas.
The request asks for a list of offices from the OFFICES
table, where the value of the TARGET column meets some
condition. It seems reasonable that the SELECT statement that
expresses the query should look something like this:
SELECT CITY
FROM OFFICES
WHERE TARGET >???
The value "???" needs to be filled in and should be equal to
"the sum of the quotas of the salespeople assigned to the office in
question." How can you specify that value in the query? Thesum of
the quotas for a specific office (say, office number 21) can be
obtained with this query:
55
SELECT SUM(QUOTA)
FROM SALESREPS
WHERE REP_OFFICE = 21
But how can you put the results of this query into the earlier
query in place of the question marks? It would seem reasonable to
start with the first query and replace the "???" with the second
query, as follows:
SELECT CITY
FROM OFFICES
WHERE TARGET > (SELECT SUM (QUOTA)
FROM SALESREPS
WHERE REP_OFFICE = OFFICE)
A few differences between a nested query or subquery and an
actual SELECT statement:
In the most common uses, a nested query or subquery must
produce a single column of data as its query results. This
means that a subquery almost always has a single select item in
its SELECT clause.
The ORDER BY clause cannot be specified in a nested query or
subquery. The subquery results are used internally by the main
query and are never visible to the user, so it makes little sense
to sort them anyway.
Column names appearing in a subquery may refer to columns of
tables in the main query.
In most implementations, a subquery cannot be the UNION of
several different SELECT statements; only a single SELECT is
allowed. (The SQL2 standard allows much more powerful query
expressions and relaxes this restriction
4.9.2 Correlated Queries:
In concept, SQL performs a subquery over and over again
once for each row of the main query. For many sub queries,
however, the subquery produces the same results for every row or
row group. Here is an example:
List the sales offices whose sales are below the average target.
SELECT CITY
FROM OFFICES
WHERE SALES < (SELECT AVG(TARGET)
FROM OFFICES)
CITY
------Denver
Atlanta
56
In this query, it would be silly to perform the subquery five

times (once for each office). The average target doesn't change
with each office; it's completely independent of the office currently
being tested. As a result, SQL can handle the query by first
performing the subquery, yielding the average target ($550,000),
and then converting the main query into:
SELECT CITY
FROM OFFICES
WHERE SALES < 550000.00
Commercial SQL implementations automatically detect this
situation and use this shortcut whenever possible to reduce the
amount of processing required by a subquery. However, the
shortcut cannot be used if the subquery contains an outer
reference, as in this example:
List all of the offices whose targets exceed the sum of the quotas of
the salespeople who
work in them:
SELECT CITY
FROM OFFICES
WHERE TARGET > (SELECT SUM(QUOTA)
FROM SALESREPS
WHERE REP_OFFICE = OFFICE)
CITY
----------Chicago
Los Angeles
For each row of the OFFICES table to be tested by the
WHERE clause of the main query, the OFFICE column (which
appears in the subquery as an outer reference) has a different
value. Thus SQL has no choice but to carry out this subquery five
timesonce for each row in the OFFICES table. A subquery
containing an outer reference is called a correlated subquery
because its results are correlated with each individual row of the
main query. For the same reason, an outer reference is sometimes
called a correlated reference.
A subquery can contain an outer reference to a table in the
FROM clause of any query that contains the subquery, no matter
how deeply the sub queries are nested. A column name in a fourthlevel subquery, for example, may refer to one of the tables named
in the FROM clause of the main query, or to a table named in the
FROM clause of the second-level subquery or the third-level
subquery that contains it. Regardless of the level of nesting, an
outer reference always takes on the value of the column in the
"current" row of the table being tested.
57
Because a subquery can contain outer references, there is

even more potential for ambiguous column names in a subquery
than in a main query. When an unqualified column name appears
within a subquery, SQL must determine whether it refers to a table
in the sub query's own FROM clause, or to a FROM clause in a
query containing the subquery. To minimize the possibility of
confusion, SQL always interprets a column reference in a subquery
using the nearest FROM clause possible. To illustrate this point, in
this example the same table is used in the query and in the
subquery:
List the salespeople who are over 40 and who manage a
salesperson over quota.
SELECT NAME
FROM SALESREPS
WHERE AGE > 40
AND EMPL_NUM IN (SELECT MANAGER
FROM SALESREPS
WHERE SALES > QUOTA)
NAME
----------Sam Clark
Larry Fitch
The MANAGER, QUOTA, and SALES columns in the
subquery are references to the SALESREPS table in the sub
query's own FROM clause; SQL does not interpret them as outer
references, and the subquery is not a correlated subquery. As
discussed earlier, SQL can perform the subquery first in this case,
finding the salespeople who are over quota and generating a list of
the employee numbers of their managers. SQL can then turn its
attention to the main query, selecting managers whose employee
numbers appear in the generated list.
If you want to use an outer reference within a subquery like
the one in the previous example, you must use a table alias to force
the outer reference. This request, which adds one more qualifying
condition to the previous one, shows how:
List the managers who are over 40 and who manage a salesperson
who is over quota and who does not work in the same sales office
as the manager.
58
SELECT NAME
FROM SALESREPS MGRS
WHERE AGE > 40
AND MGRS.EMPL_NUM IN (SELECT MANAGER
FROM SALESREPS EMPS
WHERE EMPS.QUOTA > EMPS.SALES
AND EMPS.REP_OFFICE <>
MGRS.REP_OFFICE)
NAME
----------Sam Clark
Larry Fitch
The copy of the SALESREPS table used in the main query
now has the tag MGRS, and the copy in the subquery has the tag
EMPS. The subquery contains one additional search condition,
requiring that the employee's office number does not match that of
the manager. The qualified column name MGRS.OFFICE in the
subquery is an outer reference, and this subquery is a correlated
subquery.
4.10 EMBEDDED SQL

The central idea of embedded SQL is to blend SQL
language statements directly into a program written in a "host"
programming language, such as C, Pascal, COBOL, FORTRAN,
PL/I, or Assembler. Embedded SQL uses the following techniques
to embed the SQL statements:
SQL statements are intermixed with statements of the host
language in the source program. This "embedded SQL source
program" is submitted to a SQL pre-compiler, which processes the
SQL statements.
Variables of the host programming language can be referenced in
the embedded SQL statements, allowing values calculated by the
program to be used by the SQL statements.
Program language variables also are used by the embedded SQL
statements to receive the results of SQL queries, allowing the
program to use and process the retrieved values.
Special program variables are used to assign NULL values to
database columns and to support the retrieval of NULL values from
the database.
59
Several new SQL statements that are unique to embedded SQL

are added to the interactive SQL language, to provide for row-byrow processing of query results.
4.10.1 Developing an Embedded SQL Program
An embedded SQL program contains a mix of SQL and
programming language statements, so it can't be submitted directly
to a compiler for the programming language. Instead, it moves
through a multi-step development process, shown in the following
Figure. The steps in the figure are actually those used by the IBM
mainframe databases (DB2, SQL/DS), but all products that support
embedded SQL use a similar process:
Figure: The embedded SQL development process

1. The embedded SQL source program is submitted to the SQL pre
compiler, a programming tool. The pre compiler scans the program,
finds the embedded SQL statements, and processes them. A
different pre compiler is required for each programming language
supported by the DBMS. Commercial SQL products typically offer
pre compilers for one or more languages, including C, Pascal,
COBOL, FORTRAN, Ada, PL/I, RPG, and various assembly
languages.
2. The pre compiler produces two files as its output. The first file is
the source program, stripped of its embedded SQL statements. In
their place, the pre compiler substitutes calls to the "private" DBMS
routines that provide the run-time link between the program and the
DBMS. Typically, the names and calling sequences of these
60
routines are known only to the pre compiler and the DBMS; they
are not a public interface to the DBMS. The second file is a copy of
all the embedded SQL statements used in the program. This file is
sometimes called a database request module, or DBRM.
3. The source file output from the pre compiler is submitted to the
standard compiler for the host programming language (such as a C
or COBOL compiler). The compiler processes the source code and
produces object code as its output. Note that this step has nothing
in particular to do with the DBMS or with SQL.
4. The linker accepts the object modules generated by the
compiler, links them with
various library routines, and produces
an executable program. The library routines linked into the
executable program include the "private" DBMS routines described
in Step 2.
5. The database request module generated by the pre compiler is
submitted to a special BIND program. This program examines the
SQL statements, parses, validates, and optimizes them, and
produces an application plan for each statement. The result is a
combined application plan for the entire program, representing a
DBMS-executable version of its embedded SQL statements. The
BIND program stores the plan in the database, usually assigning it
the name of the application program that created it.
The program development steps in Figure correlate with the
DBMS statement processing steps in Figure. In particular, the pre
compiler usually handles statement parsing (the first step), and the
BIND utility handles verification, optimization, and plan generation
(the second, third, and fourth steps). Thus the first four steps of
Figure, all take place at compile time when you use embedded
SQL. Only the fifth step, the actual execution of the application
plan, remains to be done at run-time.
The embedded SQL development process turns the original
embedded SQL source program into two executable parts:
An executable program, stored in a file on the computer in the
same format as any executable program
An executable application plan, stored within the database in the
format expected by the DBMS the embedded SQL development
cycle may seem cumbersome, and it is more awkward than
developing a standard C or COBOL program. In most cases, all of
the steps in Figure are automated by a single command procedure,
so the individual steps are made invisible to the application
programmer. The process does have several major advantages
from a DBMS point of view:
61
The blending of SQL and programming language statements in

the embedded SQL source program is an effective way to merge
the two languages. The host programming language provides flow
of control, variables, block structure, and input/output functions;
SQL handles database access and does not have to provide these
other constructs.
The use of a pre compiler means that the compute-intensive work
of parsing and optimization can take place during the development
cycle. The resulting executable program is very efficient in its use of
CPU resources.
The database request module produced by the pre compiler
provides portability of applications. An application program can be
written and tested on one system, and then its executable program
and DBRM can be moved to another system. After the BIND
program on the new system creates the application plan and
installs it in the database, the application program can use it without
being recompiled itself.
The program's actual run-time interface to the private DBMS
routines is completely hidden from the application programmer. The
programmer works with embedded SQL at the source code level
and does not have to worry about other, more complex interfaces.
4.11 DYNAMIC SQL

The central concept of dynamic SQL is simple: don't hardcode an embedded SQL statement into the program's source code.
Instead, let the program build the text of a SQL statement in one of
its data areas at runtime. The program then passes the statement
text to the DBMS for execution "on the fly." Although the details get
quite complex, all of dynamic SQL is built on this simple concept,
and it's a good idea to keep it in mind.
As you might expect, dynamic SQL is less efficient than
static SQL. For this reason, static SQL is used whenever possible,
and many application programmers never need to learn about
dynamic SQL. However, dynamic SQL has grown in importance as
more and more database access has moved to a client/server,
front-end/back-end architecture over the last ten years. Database
access from within personal computer applications such as
spreadsheets and word processors has grown dramatically, and an
entire set of PC-based front-end data entry and data access tools
has emerged. All of these applications require the features of
dynamic SQL.
More recently, the emergence of Internet-based "three-tier"
architectures, with applications logic executing on one ("mid-tier")
62
system and the database logic executing on another ("back-end"

system), have added new importance to capabilities that have
grown out of dynamic SQL. In most of these three-tier
environments, the applications logic running in the middle tier is
quite dynamic. It must be changed frequently to respond to new
business conditions and to implement new business rules. This
frequently changing environment is at odds with the very tight
coupling of applications programs and database contents implied
by static SQL. As a result, most three-tier architectures use a
callable SQL API (described in the next chapter) to link the middle
tier to back-end databases. These APIs explicitly borrow the key
concepts of dynamic SQL (for example, separate PREPARE and
EXECUTE steps and the EXECUTE IMMEDIATE capability) to
provide their database access. A solid understanding of dynamic
SQL concepts is thus important to help a programmer understand
what's going on "behind the scenes" of the SQL API. In
performance-sensitive applications, this understanding can make
all the difference between an application design that provides good
performance and response times and one that does not.
4.12 TRIGGERS
The concept of a trigger is relatively straightforward. For any
event that causes a change in the contents of a table, a user can
specify an associated action that the DBMS should carry out. The
three events that can trigger an action are attempts to INSERT,
DELETE, or UPDATE rows of the table. The action triggered by an
event is specified by a sequence of SQL statements.
To understand how a trigger works, let's examine a concrete
example. When a new order is added to the ORDERS table, these
two changes to the database should also take place:
The SALES column for the salesperson who took the order
should be increased by the amount of the order.
The QTY_ON_HAND amount for the product being ordered
should be decreased by the quantity ordered.
This Transact-SQL statement defines a SQL Server trigger,
named NEWORDER that causes these database updates to
happen automatically:
CREATE TRIGGER NEWORDER
ON ORDERS
FOR INSERT
AS UPDATE SALESREPS
SET SALES = SALES + INSERTED.AMOUNT
FROM SALESREPS, INSERTED
WHERE SALESREPS.EMPL_NUM = INSERTED.REP
UPDATE PRODUCTS
63
SET QTY_ON_HAND = QTY_ON_HAND - INSERTED.QTY
FROM PRODUCTS, INSERTED
WHERE PRODUCTS.MFR_ID = INSERTED.MFR
AND PRODUCTS.PRODUCT_ID = INSERTED.PRODUCT
The first part of the trigger definition tells SQL Server that the
trigger is to be invoked whenever an INSERT statement is
attempted on the ORDERS table. The remainder of the definition
(after the keyword AS) defines the action of the trigger. In this case,
the action is a sequence of two UPDATE statements, one for the
SALESREPS table and one for the PRODUCTS table. The row
being inserted is referred to using the pseudo-table name inserted
within the UPDATE statements. As the example shows, SQL
Server extends the SQL language substantially to support triggers.
Other extensions not shown here include IF/THEN/ELSE tests,
looping, procedure calls, and even PRINT statements that display
user messages.
The trigger capability, while popular in many DBMS
products, is not a part of the ANSI/ISO SQL2 standard. As with
other SQL features whose popularity has preceded standardization,
this has led to a considerable divergence in trigger support across
various DBMS brands. Some of the differences between brands are
merely differences in syntax. Others reflect real differences in the
underlying capability.
DB2's trigger support provides an instructive example of the
differences. Here is the same trigger definition shown previously for
SQL Server, this time using the DB2 syntax:
AFTER INSERT ON ORDERS
REFERENCING NEW AS NEW_ORD
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
UPDATE SALESREPS
SET SALES = SALES + NEW_ORD.AMOUNT
WHERE SALESREPS.EMPL_NUM = NEW_ORD.REP;
UPDATE PRODUCTS
SET
QTY_ON_HAND
=
QTY_ON_HAND
NEW_ORD.QTY
WHERE PRODUCTS.MFR_ID = NEW_ORD.MFR
AND
PRODUCTS.PRODUCT_ID
=
NEW_ORD.PRODUCT;
END
The beginning of the trigger definition includes the same
elements as the SQL Server definition, but rearranges them. It
explicitly tells DB2 that the trigger is to be invoked AFTER a new
order is inserted into the database. DB2 also allows you to specify
64
that the trigger is to be carried out before a triggering action is

applied to the database contents. This doesn't make sense in this
example, because the triggering event is an INSERT operation, but
it does make sense for UPDATE or DELETE operations.
The DB2 REFERENCING clause specifies a table alias
(NEW_ORD) that will be used to refer to the row being inserted
throughout the remainder of the trigger definition. It serves the
same function as the INSERTED keyword in the SQL Server
trigger. The statement references the "new" values in the inserted
row because this is an INSERT operation trigger. For a DELETE
operation trigger, the "old" values would be referenced. For an
UPDATE operation trigger, DB2 gives you the ability to refer to both
the "old" (pre-UPDATE) values and "new" (post-UPDATE) values.
The BEGIN ATOMIC and END serve as brackets around the
sequence of SQL statements that define the triggered action. The
two searched UPDATE statements in the body of the trigger
definition are straightforward modifications of their SQL Server
counterparts. They follow the standard SQL syntax for searched
UPDATE statements, using the table alias specified by the
REFERENCING clause to identify the particular row of the
SALESREPS table and the PRODUCTS table to be updated. The
row being inserted is referred to using the pseudo-table name
inserted within the UPDATE statements.
Here is another example of a trigger definition, this time using
Informix Universal Server:
INSERT ON ORDERS
AFTER (EXECUTE PROCEDURE NEW_ORDER)
This trigger again specifies an action that is to take place
AFTER a new order is inserted. In this case, the multiple SQL
statements that form the triggered action can't be specified directly
in the trigger definition. Instead, the triggered statements are placed
into an Informix stored procedure, named NEW_ORDER and the
trigger causes the stored procedure to be executed. As this and the
preceding examples show, although the core concepts of a trigger
mechanism are very consistent across databases, the specifics
vary a great deal. Triggers are certainly among the least portable
aspects of SQL databases today.
4.12.1Trigger Advantages and Disadvantages
A complete discussion of triggers is beyond the scope of this
book, but even these simple examples shows the power of the
trigger mechanism. The major advantage of triggers is that
business rules can be stored in the database and enforced
65
consistently with each update to the database. This can

dramatically reduce the complexity of application programs that
access the database. Triggers also have some disadvantages,
including these:
Database complexity. When the rules are moved into the
database, setting up the database becomes a more complex task.
Users who could reasonably be expected to create small, ad hoc
applications with SQL will find that the programming logic of
triggers makes the task much more difficult.
Hidden rules. With the rules hidden away inside the database,
programs that appear to perform straightforward database updates
may, in fact, generate an enormous amount of database activity.
The programmer no longer has total control over what happens to
the database. Instead, a program-initiated database action may
cause other, hidden actions.
Hidden performance implications. With triggers stored inside the
database, the consequences of executing a SQL statement are no
longer completely visible to the programmer. In particular, an
apparently simple SQL statement could, in concept, trigger a
process that involves a sequential scan of a very large database
table, which would take a long time to complete. These
performance implications of any given SQL statement are invisible
to the programmer.
66
5
DATABASE APPLICATION
DEVELOPMENT
Topics covered
5.1 Accessing Databases From Applications
5.2 Cursors
5.3 JDBC Driver Management
5.4 Executing SQL Statements
5.5 ResultSets
5.1 ACCESSING DATABASES FROM APPLICATIONS

In this section, we cover how SQL commands can be
executed from within a program in a host language such as C or
Java. The use of SQL commands within a host language program
is called Embedded SQL. Details of Embedded SQL also depend
on the host language. Although similar capabilities are supported
for a variety of host languages, the syntax sometimes varies.
5.1.1 Embedded SQL
Conceptually, embedding SQL commands in a host
language program is straight-forward. SQL statements (i.e., not
declarations) can be used wherever a statement in the host
language is allowed (with a few restrictions). SQL statements must
be clearly marked so that a preprocessor can deal with them before
invoking the compiler for the host language. Also, any host
language variables used to pass arguments into an SQL command
must be declared in SQL. In particular, some special host language
variables must be declared in SQL (so that, for example, any error
conditions arising during SQL execution can be communicated
back to the main application program in the host language).
There are, however, two complications to bear in mind. First,
the data types recognized by SQL may not be recognized by the
host language and vice versa. This mismatch is typically addressed
by casting data values appropriately before passing them to or from
SQL commands. (SQL, like other programming languages,
provides an operator to cast values of aIle type into values of
another type.) The second complication has to do with SQL being
set-oriented, and is addressed using cursors
67
Declaring Variables and Exceptions

SQL statements can refer to variables defined in the host
program. Such host-language variables must be prefixed by a colon
(:) in SQL statements and be declared between the commands
EXEC SQL BEGIN DECLARE SECTION and EXEC SQL END
DECLARE SECTION. The declarations are similar to how they
would look in a C program and, as usual in C. are separated by
semicolons. For example. we can declare variables c-sname, c_sid,
c_rating, and c_age (with the initial c used as a naming convention
to emphasize that these are host language variables) as follows:
The first question that arises is which SQL types correspond

to the various C types, since we have just declared a collection of C
variables whose values are intended to be read (and possibly set)
in an SQL run-time environment when an SQL statement that refers
to them is executed. The SQL-92 standard defines such a
correspondence between the host language types and SQL types
for a number of host languages. In our example, c_snamc has the
type CHARACTER (20) when referred to in an SQL statement, csid
has the type INTEGER, crating has the type SMALLINT, and cage
has the type REAL. We also need some way for SQL to report what
went wrong if an error condition arises when executing an SQL
statement. The SQL-92 standard recognizes two special variables
for reporting errors, SQLCODE and SQLSTATE. SQLCODE is the
older of the two and is defined to return some negative value when
an error condition arises, without specifying further just what error a
particular negative integer denotes. SQLSTATE, introduced in the
SQL-92 standard for the first time, associates predefined values
with several common error conditions, thereby introducing some
uniformity to how errors are reported. One of these two variables
must be declared. The appropriate C type for SQLCODE is long
and the appropriate C type for SQLSTATE is char [6J , that is, a
character string five characters long.
5.2 CURSORS
A major problem in embedding SQL statements in a host
language like C is that an impedance mismatch occurs because
SQL operates on set" of records, whereas languages like C do not
cleanly support a set-of-records abstraction. The solution is to
essentially provide a mechanism that allows us to retrieve rows one
at a time from a relation. This mechanism is called a cursor. We
can declare a cursor on any relation or on any SQL query (because
68
every query returns a set of rows). Once a cursor is declared, we

can open it (which positions the cursor just before the first row);
fetch the next row; move the cursor (to the next row, to the row
after the next n, to the first row, or to the previous row, etc., by
specifying additional parameters for the FETCH command); or
close the cursor. Thus, a cursor essentially allows us to retrieve the
rows in a table by positioning the cursor at a particular row and
reading its contents.
5.2.1 Basic Cursor Definition and Usage
Cursors enable us to examine, in the host language
program, a collection of JWS computed by an Embedded SQL
statement:
1) We usually need to open a cursor if the embedded statement is
a SELECT (i.e.) a query). However, we can avoid opening a cursor
if the answer contains a single row, as we see shortly.
2) NSERT, DELETE, and UPDATE statements typically require no
cursor, although some variants of DELETE and UPDATE use a
cursor.
As an example, we can find the name and age of a sailor,
specified by assigning a value to the host variable c_sid, declared
earlier, as follows:
This query returns a collection of rows, not just one row.

When executed interactively, the answers are printed on the
screen. If we embed this query in a C program by prefixing the
command with EXEC SQL, how can the answers be bound to host
language variables? The INTO clause is inadequate because we
must deal with several rows. The solution is to use a cursor:
This code can be included in a C program, and once it is

executed, the cursor sinfo is defined. Subsequently, we can open
the cursor:
OPEN sinfo:
The value of c_minrating in the SQL query associated with
the cursor is the value of this variable when we open the cursor.
69
(The cursor declaration is processed at compile-time, and the

OPEN command is executed at run-time.)
A cursor can be thought of as 'pointing' to a row in the
collection of answers to the query associated with it. When a cursor
is opened, it is positioned just before the first row. We can use the
FETCH command to read the first row of cursor sinfo into host
language variables:
FETCH sinfo INTO: csname, cage;
When the FETCH statement is executed, the cursor is
positioned to point at the next row (which is the first row in the table
when FETCH is executed for the first time after opening the cursor)
and the column values in the row are copied into the corresponding
host variables. By repeatedly executing this FETCH statement (say,
in a while-loop in the C program), we can read all the rows
computed by the query, one row at a time. Additional parameters to
the FETCH command allow us to position a cursor in very flexible
ways.
5.3 JDBC DRIVER MANAGEMENT

In .Jdbc, data source drivers are managed by the
Drivermanager class, which maintains a list of all currently loaded
drivers. The Drivermanager class has methods registerDriver,
deregisterDriver, and getDrivers to enable dynamic addition and
deletion of drivers.
The first step in connecting to a data source is to load the
corresponding JDBC driver. This is accomplished by using the Java
mechanism for dynamically loading classes. The static method
forName in the Class class returns the Java class as specified in
the argument string and executes its static constructor.
The static constructor of the dynamically loaded class loads
an instance of the Driver class, and this Driver object registers itself
with the DriverManager class.
The following Java example code explicitly loads a JDBC
driver:
Class.forName("oracle/jdbc.driver.OracleDriver");
There are two other ways ofregistering a driver. We can
include the driver with jdbc. drivers=oracle/jdbc.driver at the
command line when we start the Java application. Alternatively, we
can explicitly instantiate a driver, but this method is used only
rarely, as the name of the driver has to be specified in the
application code, and thus the application becomes sensitive to
70
changes at the driver level. After registering the driver, we connect

to the data source.
5.3.1 Connections
A session with a data source is started through creation of a
Connection object; A connection identifies a logical session with a
data source; multiple connections within the same Java program
can refer to different data sources or the same data source.
Connections are specified through a JDBC URL, a URL that uses
the jdbc protocol. Such a URL has the form
jdbc:<subprotocol>:<otherParameters>
The following code example establishes a connection to an
Oracle database assuming that the strings userld and password are
set to valid values. In JDBC, connections can have different
properties. For example, a connection can specify the granularity of
transactions. If autocommit is set for a connection, then each SQL
statement is considered to be its own transaction. If autocommit is
off, then a series of statements that compose a transaction can be
committed using the commit() method of the Connection class, or
aborted using the rollback() method. The Connection class has
methods to set the
String uri = jdbc:oracle:www.bookstore.com:3083
Connection connection;
try {
Connection connection =
DriverManager.getConnection(urI,userId,password);
}
catch(SQLException excpt) {
System.out.println(excpt.getMessageO);
return;
}
autocommit mode (Connection. setAutoCommi t) and to
retrieve the current autocommit mode (getAutoCommit). The
following methods are part of the. Connection interface and permit
setting and getting other properties:
public int getTransactionIsolation() throws SQLExceptionand
public void setTransactionlsolation(int 1) throws SQLException.
These two functions get and set the current level of isolation for
transactions handled in the current connection. All five SQL levels
of isolation (are possible, and argument 1 can be
set as follows:
- TRANSACTIONJNONE
- TRANSACTIONJREAD.UNCOMMITTED
- TRANSACTIONJREAD.COMMITTED
- TRANSACTIONJREPEATABLEJREAD
- TRANSACTION.BERIALIZABLE
71
public boolean getReadOnlyO throws SQLException and public

void setReadOnly(boolean readOnly) throws SQLException. These
two functions allow the user to specify whether the transactions
executecl through this connection are read only.
public boolean isClosed() throws SQLException.
Checks whether the current connection has already been closed.
setAutoCommi t and get AutoCommi t.
We already discussed these two functions. Establishing a
connection to a data source is a costly operation since it involves
several steps, such as establishing a network connection to the
data source, authentication, and allocation of resources such as
memory. In case an application establishes many different
connections from different parties (such as a Web server),
connections are often pooled to avoid this overhead. A connection
pool is a set of established connections to a data source. Whenever
a new connection is needed, one of the connections from the pool
is used, instead of creating a new connection to the data source.
Connection pooling can be handled either by specialized code in
the application, or the optional j avax. sql package, which provides
functionality for connection pooling and allows us to set different
parameters, such as the capacity of the pool, and shrinkage and
growth rates.
5.4 EXECUTING SQL STATEMENTS

We now discuss how to create and execute SQL statements
using JDBC. In the JDBC code examples in this section, we
assume that we have a Connection object named con. JDBC
supports three different ways of executing statements: Statement,
PreparedStatement, and CallableStatement. The Statement class
is the base class for the other two statment classes. It allows us to
query the data source with any static or dynamically generated SQL
query. The PreparedStatement cla,Cis dynamically generates
precompiled SQL statements that can be used several times;
these SQL statements can have parameters, but their structure is
fixed when the PreparedStatement object (representing the SQL
statement) is created.
Consider the sample code using a PreparedStatment object
shown in following Figure The SQL query specifies the query
string, but uses ''1' for the values of the parameters, which are set
later using methods setString, setFloat, and setlnt. The ''1'
placeholders can be used anywhere in SQL statements where they
can be replaced with a value. Examples of places where they can
appear include the WHERE clause (e.g., 'WHERE author=?'), or in
SQL UPDATE and INSERT statements.
72
The method setString is one way to set a parameter value;

analogous methods are available for int, float, and date. It is good
style to always use clearParameters 0 before setting parameter
values in order to remove any old data. There are different ways of
submitting the query string to the data source. In the example, we
used the executeUpdate command, which is used if we know
that the SQL statement does not return any records (SQL
UPDATE, INSERT, ALTER, and DELETE statements). The
executeUpdate method returns an integer indicating the number of
rows the SQL statement modified; it returns 0 for successful
execution without modifying any rows. The executeQuery method is
used if the SQL statement returns data, such as in a regular
SELECT query. JDBC has its own cursor mechanism in the form of
a ResultSet object, which we discuss next.
5.5 RESULTSETS
As discussed in the previous section, the statement
executeQuery returns a ResultSet object, which is similar to a
cursor. ResultSet cursors in JDBC 2.0 are very powerful; they allow
forward and reverse scrolling and in-place editing and insertions.
In its most basic form, the ResultSet object allows us to read
one row of the output of the query at a time. Initially, the ResultSet
is positioned before the first row, and we have to retrieve the first
row with an explicit call to the next() method. The next method
returns false if there are no more rows in the query answer, and
73
true other\vise. The code fragment shown in following Figure

illustrates the basic usage of a ResultSet object.
ResultSet rs=stmt.executeQuery(sqlQuery);
/ / rs is now a cursor
/ / first call to rs.nextO moves to the first record
/ / rs.nextO moves to the next row
String sqlQuery;
ResultSet rs = stmt.executeQuery(sqlQuery)
while (rs.next()) {
/ / process the data
}
Figure: Using a ResultSet Object
While next () allows us to retrieve the logically next row in
the query answer, we can move about in the query answer in other
ways too:
previous() moves back one row.
absolute (int num) moves to the row with the specified number.
relative (int num) moves forward or backward (if num is negative)
relative to the current position. relative (-1) has the same effect as
previous.
first 0 moves to the first row, and last 0 moves to the last row.
74
6
OVERVIEW OF STORAGE AND INDEXING
Topics covered
6.1 Storage Hierarchies
6.2 Tree structured indexing and Hash Based indexing
6.1 STORAGE HIERARCHIES

The collection of data that makes up a computerized
database must be stored physically on some computer storage
medium. The DBMS software can then retrieve, update, and
process this data as needed. Computer storage media form a
storage hierarchy that includes two main categories:
Primary storage: This category includes storage media that can
be operated on directly by the computer central processing unit
(CPU), such as the computer main memory and smaller but faster
cache memories. Primary storage usually provides fast access to
data but is of limited storage capacity.
Secondary storage: This category includes magnetic disks,
optical disks, and tapes. These devices usually have a larger
capacity, cost less, and provide slower access to data than do
primary storage devices. Data in secondary storage cannot be
processed directly by the CPU; it must first be copied into primary
storage.
6.1.1 Memory Hierarchies and Storage Devices
In a modern computer system data resides and is
transported throughout a hierarchy of storage media. The highestspeed memory is the most expensive and is therefore available
with the least capacity. The lowest-speed memory is tape storage,
which is essentially available in indefinite storage capacity. At the
primary storage level, the memory hierarchy includes at the most
expensive end cache memory, which is a static RAM (Random
Access Memory). Cache memory is typically used by the CPU to
speed up execution of programs. The next level of primary storage
is DRAM (Dynamic RAM), which provides the main work area for
the CPU for keeping programs and data and is popularly called
main memory. The advantage of DRAM is its low cost, which
continues to decrease; the drawback is its volatility and lower
speed compared with static RAM. At the secondary storage level,
75
the hierarchy includes magnetic disks, as well as mass storage in

the form of CD-ROM (Compact DiskRead-Only Memory) devices,
and finally tapes at the least expensive end of the hierarchy. The
storage capacity is measured in kilobytes (Kbyte or 1000 bytes),
megabytes (Mbyte or 1 million bytes), gigabytes (Gbyte or 1 billion
bytes), and even terabytes (1000 Gbytes). Programs reside and
execute in DRAM. Generally, large permanent databases reside on
secondary storage, and portions of the database are read into and
written from buffers in main memory as needed.
Now that personal computers and workstations have tens of
megabytes of data in DRAM, it is becoming possible to load a large
fraction of the database into main memory. In some cases, entire
databases can be kept in main memory (with a backup copy on
magnetic disk), leading to main memory databases; these are
particularly useful in real-time applications that require extremely
fast response times. An example is telephone switching
applications, which store databases that contain routing and line
information in main memory.
Between DRAM and magnetic disk storage, another form of
memory, flash memory, is becoming common, particularly because
it is nonvolatile. Flash memories are high-density, highperformance memories using EEPROM (Electrically Erasable
Programmable Read-Only Memory) technology. The advantage of
flash memory is the fast access speed; the disadvantage is that an
entire block must be erased and written over at a time.
CD-ROM disks store data optically and are read by a laser.
CD-ROMs contain prerecorded data that cannot be overwritten.
WORM (Write-Once-Read-Many) disks are a form of optical
storage used for archiving data; they allow data to be written once
and read any number of times without the possibility of erasing.
They hold about half a gigabyte of data per disk and last much
longer than magnetic disks. Optical juke box memories use an
array of CD-ROM platters, which are loaded onto drives on
demand. Although optical juke boxes have capacities in the
hundreds of gigabytes, their retrieval times are in the hundreds of
milliseconds, quite a bit slower than magnetic disks. This type of
storage has not become as popular as it was expected to be
because of the rapid decrease in cost and increase in capacities of
magnetic disks. The DVD (Digital Video Disk) is a recent standard
for optical disks allowing four to fifteen gigabytes of storage per
disk. Finally, magnetic tapes are used for archiving and backup
storage of data. Tape jukeboxeswhich contain a bank of tapes
that are catalogued and can be automatically loaded onto tape
drivesare becoming popular as tertiary storage to hold terabytes
of data.
76
For example, NASAs EOS (Earth Observation Satellite)

system stores archived databases in this fashion. It is anticipated
that many large organizations will find it normal to have terabyte
sized databases in a few years. The term very large database
cannot be defined precisely anymore because disk storage
capacities are on the rise and costs are declining. It may very soon
be reserved for databases containing tens of terabytes.
6.1.2 Storage of Databases
Databases typically store large amounts of data that must
persist over long periods of time. The data is accessed and
processed repeatedly during this period. This contrasts with the
notion of transient data structures that persist for only a limited time
during program execution. Most databases are stored permanently
(or persistently) on magnetic disk secondary storage, for the
following reasons:
Generally, databases are too large to fit entirely in main memory.
The circumstances that cause permanent loss of stored data arise
less frequently for disk secondary storage than for primary storage.
Hence, we refer to diskand other secondary storage devicesas
nonvolatile storage, whereas main memory is often called volatile
storage.
The cost of storage per unit of data is an order of magnitude less
for disk than for primary storage.
Some of the newer technologiessuch as optical disks,
DVDs, and tape jukeboxesare likely to provide viable alternatives
to the use of magnetic disks. Databases in the future may therefore
reside at different levels of the memory hierarchy. For now,
however, it is important to study and understand the properties and
characteristics of magnetic disks and the way data files can be
organized on disk in order to design effective databases with
acceptable performance.
Magnetic tapes are frequently used as a storage medium for
backing up the database because storage on tape costs even less
than storage on disk. However, access to data on tape is quite
slow. Data stored on tapes is off-line; that is, some intervention by
an operatoror an automatic loading deviceto load a tape is
needed before this data becomes available. In contrast, disks are
on-line devices that can be accessed directly at any time.
The techniques used to store large amounts of structured
data on disk are important for database designers, the DBA, and
implementers of a DBMS. Database designers and the DBA must
know the advantages and disadvantages of each storage technique
when they design, implement, and operate a database on a specific
77
DBMS. Usually, the DBMS has several options available for

organizing the data, and the process of physical database design
involves choosing from among the options the particular data
organization techniques that best suit the given application
requirements. DBMS system implementers must study data
organization techniques so that they can implement them efficiently
and thus provide the DBA and users of the DBMS with sufficient
options.
Typical database applications need only a small portion of
the database at a time for processing. Whenever a certain portion
of the data is needed, it must be located on disk, copied to main
memory for processing, and then rewritten to the disk if the data is
changed. The data stored on disk is organized as files of records.
Each record is a collection of data values that can be interpreted as
facts about entities, their attributes, and their relationships. Records
should be stored on disk in a manner that makes it possible to
locate them efficiently whenever they are needed.
There are several primary file organizations, which
determine how the records of a file are physically placed on the
disk, and hence how the records can be accessed. A heap file (or
unordered file) places the records on disk in no particular order by
appending new records at the end of the file, whereas a sorted file
(or sequential file) keeps the records ordered by the value of a
particular field (called the sort key). A hashed file uses a hash
function applied to a particular field (called the hash key) to
determine a records placement on disk. Other primary file
organizations, such as B-trees, use tree structures. A secondary
organization or auxiliary access structure allows efficient access to
the records of a file based on alternate fields than those that have
been used for the primary file organization.
6.2 TREE STRUCTURED INDEXING AND HASH

BASED INDEXING
Hash-based indexes are best for equality selections. Cannot

support range searches.
B+-trees are best for sorted access and range queries.
6.2.1 Tree Structured Indexing

The data entries are arranged in sorted order by search key
value, and a hierarchical search data structure is maintained that
directs searches to the correct page of data entries.
Following Figure 8.3 shows the employee records, organized in a
tree-structured index with search key age. Each node in this figure
(e.g., nodes labeled A, B, L1, L2) is a physical page, and retrieving
a node involves a disk I/O.
78
The lowest level of the tree, called the leaf level, contains the
data entries; in our example, these are employee records. To
illustrate the ideas better, we have drawn the Figure as if there
were additional employee records, some with age less than 22 and
some with age greater than EiO. Additional records with age less
than 22 would appear in leaf pages to the left page L1 and records
with age greater than 50 would appear in leaf pages to the right of
page L3.
Figure: Tree-Structured Index

This structure allows us to efficiently locate all data entries
with search key values in a desired range. All searches begin at the
topmost node, called the root, and the contents of pages in non-leaf
levels direct searches to the correct leaf page. Non-leaf pages
contain node pointers separated by search key values. The node
pointer to the left of a key value k points to a subtree that contains
only data entries less than k. The node pointer to the right of a key
value k points to a subtree that contains only data entries greater
than or equal to k.
In our example, suppose we want to find all data entries with
24 < age < 50.
In our example search, we look for data entries with search
key value > 24, and get directed to the middle child, node A. Again,
examining the contents of this node, we are directed to node B.
Examining the contents of node B, we are directed to leaf node Ll,
which contains data entries we are looking for. Observe that leaf
nodes L2 and L3 also contain data entries that satisfy our search
criterion. To facilitate retrieval of such qualifying entries during
search, all leaf pages are maintained in a doubly-linked list. Thus,
we can fetch page L2 using the 'next' pointer on page Ll, and then
fetch page L3 using the 'next' pointer on L2. Thus, the number of
disk I/Os incurred during a search is equal to the length of a path
from the root to a leaf, plus the number of leaf pages with qualifying
data entries. The B+ tree is an index structure that ensures that all
paths from the root to a leaf in a given tree are of the same length,
79
that is, the structure is always balanced in height. Finding the

correct leaf page is faster than binary search of the pages in a
sorted file because each non-leaf node can accommodate a very
large number of node-pointers, and the height of the tree is rarely
more than three or four in practice. The height of a balanced tree is
the length of a path from root to leaf; in Figure 8.3, the height is
three. The number of l/Os to retrieve a desired leaf page is four,
including the root and the leaf page. (In practice, the root is typically
in the buffer pool because it is frequently accessed, and we really
incur just three I/Os for a tree of height three.)
The average number of children for a non-leaf node is called
the fan-out of the tree. If every non-leaf node has n children, a tree
of height h has nh leaf pages. In practice, nodes do not have the
same number of children, but using the average value F for n, we
still get a good approximation to the number of leaf pages, Fh . In
practice, F is at least 100, which means a tree of height four
contains 100 million leaf pages. Thus, we can search a file with 100
million leaf pages and find the page we want using four l/Os; in
contrast, binary search of the same file would take
log21OO,000,000 (over 25) l/Os.
6.2.2 Hash Based indexing
We can organize records using a technique called hashing
to quickly find records that have a given search key value. For
example, if the file of employee records is hashed on the name
field, we can retrieve all records about Joe. In this approach, the
records in a file are grouped in buckets, where a bucket consists of
a primary page and, possibly, additional pages linked in a chain.
The bucket to which a record belongs can be determined by
applying a special function, called a hash function, to the search
key. Given a bucket number, a hash-based index structure allows
us to retrieve the primary page for the bucket in one or two disk
l/Os. On inserts, the record is inserted into the appropriate bucket,
with 'overflow' pages allocated as necessary. To search for a
record with a given search key value, we apply the hash function to
identify the bucket to which such records belong and look at all
pages in that bucket. If we do not have the search key value for the
record, for example, the index is based on sal and we want records
with a given age value, we have to scan all pages in the file. In this
chapter, we assume that applying the hash function to (the search
key of) a record allows us to identify and retrieve the page
containing the record with one I/O. In practice, hash-based index
structures that adjust gracefully to inserts and deletes and allow us
to retrieve the page containing a record in one to two l/Os are
known. Hash indexing is illustrated in Figure below, where the data
is stored in a file that is hashed on age; the data entries in this first
index file are the actual data records. Applying the hash function to
the age field identifies the page that the record belongs to. The
80
hash function h for this example is quite simple; it converts the

search key value to its binary representation and uses the two least
significant bits as the bucket identifier. The Figure also shows an
index with search key sal that contains (sal, rid) pairs as data
entries. The tid (short for record id) component of a data entry in
this second index is a pointer to a record with search key value sal
(and is shown in the figure as an arrow pointing to the data record).
The file of employee records is hashed on age, and Alternative (1)
is used for for data entries. The second index, on sal, also uses
hashing to locate data entries, which are now (sal, rid of employee
recoT'(~ pairs; that is, Alternative (2) is used for data
entries.
Figure: Index-Organized File Hashed on age, with Auxiliary

Index on sal
Note that the search key for an index can be any sequence
of one or more fields, and it need not uniquely identify records. For
example, in the salary index, two data entries have the same
search key value 6003.
Summary:
Many alternative file organizations exist, each appropriate in

some situation.
If selection queries are frequent, sorting the file or building

an index is important.
Hash-based indexes only good for equality search.
Sorted files and tree-based indexes best for range search;

also good for equality search.
Files rarely kept sorted in practice; B+ tree index is better.
Index is a collection of data entries plus a way to quickly find

entries with given key values.
81
Data entries can be :

o
actual data records,
<key, rid> pairs, or
<key, rid-list> pairs.
Can have several indexes on a given file of data records, each

with a different search key.
Indexes can be classified as clustered vs. unclustered,
Differences
have
important
consequences
for
utility/performance of query processing
Several kinds of integrity constraints can be expressed in the
ER model: key constraints, participation constraints, and overlap
/covering constraints for ISA hierarchies. Some foreign key
constraints are also implicit in the definition of a relationship set.
Some constraints (notably, functional dependencies) cannot

be expressed in the ER model.
Constraints play an important role in determining the best

database design for an enterprise.
82
7
QUERY EVALUATION OVERVIEW
Topics covered
7.1 Overview of Query optimization
7.2 Relational optimization
7.1 OVERVIEW OF QUERY OPTIMIZATION

Query optimization is a function of many relational database
management systems in which multiple query plans for satisfying a
query are examined and a good query plan is identified. This may
or not be the absolute best strategy because there are many ways
of doing plans. There is a trade-off between the amount of time
spent figuring out the best plan and the amount running the plan.
Different qualities of database management systems have different
ways of balancing these two. Cost based query optimizers evaluate
the resource footprint of various query plans and use this as the
basis for plan selection.
Typically the resources which are costed are CPU path
length, amount of disk buffer space, disk storage service time, and
interconnect usage between units of parallelism. The set of query
plans examined is formed by examining possible access paths
(e.g., primary index access, secondary index access, full file scan)
and various relational table join techniques (e.g., merge join, hash
join, product join). The search space can become quite large
depending on the complexity of the SQL query. There are two types
of optimization. These consist of logical optimization which
generates a sequence of relational algebra to solve the query. In
addition there is physical optimization which is used to determine
the means of carrying out each operation.
7.1.1 Query Evaluation Plan
A query evaluation plan (or simply plan) consists of an
extended relational algebra tree, with additional annotations at each
node indicating the access methods to use for each table and the
implementation method to use for each relational operator.
Consider the following SQL query:
83
This expression is shown in the form of a tree in following

Figure. The algebra expression partially specifies how to evaluate
the query-owe first compute the natural join of Reserves and
Sailors, then perform the selections, and finally project the snarne
field.
To obtain a fully specified evaluation plan, we must decide

on an implementation for each of the algebra operations involved.
}or example, we can use a page-oriented simple nested loops join
with Reserves as the outer table and apply selections and
projections to each tuple in the result of the join as it is produced;
the result of the join before the selections and projections is never
stored in its entirety.
84
In drawing the query evaluation plan, we have used the

convention that the outer table is the left child of the join operator.
7.2 RELATIONAL OPTIMIZATION: - cost of a plan

estimating result sizes
For each enumerated plan, we have to estimate its cost.
There are two parts to estimating the cost of an evaluation plan for
a query block:
1. For each node in the tree, we must estimate the cost of
performing the corresponding operation. Costs are affected
significantly by whether pipelining is used or temporary relations
are created to pass the output of an operator to its parent.
2. For each node in the tree, we must estimate the size of the result
and whether it is sorted. This result is the input for the operation
that corresponds to the parent of the current node, and the size and
sort order in turn affect the estimation of size, cost, and sort order
for the panmt. As we saw there, estimating costs requires
knowledge of various parameters of the input relations, such as the
number of pages and available indexes. Such statistics are
maintained in the DBMS's system catalogs. In this section, we
describe the statistics maintained by a typical DBMS and discuss
how result sizes are estimated. We use the number of page l/Os as
the metric of cost and ignore issues such as blocked access, for the
sake of simplicity. The estimates used by a DBMS for result sizes
and costs are at best approximations to actual sizes and costs. It is
unrealistic to expect an optimizer to find the very best plan; it is
more important to avoid the worst plans and find a good plan.
7.2.1 Estimating Result Sizes
We now discuss how a typical optimizer estimates the size
of the result computed by an operator on given inputs. Size
estimation plays an important role in cost estimation as well
because the output of one operator can be the input to another
operator, and the cost of an operator depends on the size of its
inputs.
Consider a query block of the form:
The maximum number of tuples in the result of this query

(without duplicate elimination) is the product of the cardinalities of
85
the relations in the FROM clause. Every term in the WHERE

clause, however, eliminates some of these potential result tuples.
We can model the effect of the WHERE clause on the result size by
associating a reduction factor with each term, which is the ratio of
the (expected) result size to the input size considering only the
selection represented by the term. The actual size of the result can
be estimated as the maximum size times the product of the
reduction factors for the terms in the WHERE clause. Of course,
this estimate reflects the unrealistic but simplifying assumption that
the conditions tested by each term are statistically independent.
86
8
SCHEMA REFNEENT AND NORMAL
FORMS
Topics covered
8.1 Functional Dependencies
8.2 Second Normal Form
8.3 Third Normal Form
8.4 Fourth Normal Form
8.6 Fifth Normal Form
8.7 BCNF
8.8 Comparison of 3NF and BCNF
8.9 Lossless and dependency preserving decomposition
8.10 Closure of Dependencies
8.11 Minimal Closure (Cover)
8.1 FUNCTIONAL DEPENDENCIES:

Functional dependency describes the relationship between
attributes in a relation. For example, if A and B are attributes of
relation R, and B is functionally dependent on A ( denoted A B), if
each value of A is associated with exactly one value of B. ( A and B
may each consist of one or more attributes.)
Trival functional dependency means that the right-hand

side is a subset ( not necessarily a proper subset) of the left-hand
side. They do not provide any additional information about possible
integrity constraints on the values held by these attributes.
87
We are normally more interested in nontrivial

dependencies because they represent integrity constraints for the
relation.
Main characteristics of functional dependencies in
normalization
Have a one-to-one relationship between attribute(s) on the
left- and right- hand side of a dependency;
hold for all time;
are nontrivial.
Functional dependency is a property of the meaning or
semantics of the attributes in a relation. When a functional
dependency is present, the dependency is specified as a
constraint between the attributes.
An important integrity constraint to consider first is the
identification of candidate keys, one of which is selected to be
the primary key for the relation using functional dependency.
8.1 FIRST NORMAL FORM:

A basic objective of the first normal form defined by Codd in
1970 was to permit data to be queried and manipulated using a
"universal data sub-language" grounded in first-order logic. SQL is
an example of such a data sub-language, albeit one that Codd
regarded as seriously flawed.). Querying and manipulating the
data within an unnormalized data structure, such as the following
non-1NF representation of customers' credit card transactions,
involves more complexity than is really necessary:
Customer Transactions
Jones
Tr. ID Date
Amount
12890 14-Oct-2003 -87
12904 15-Oct-2003 -50
Wilkins
Tr. ID Date
Amount
12898 14-Oct-2003 -21
Stevens
Tr. ID Date
Amount
12907 15-Oct-2003 -18
14920 20-Nov-2003 -70
15003 27-Nov-2003 -60
To each customer there corresponds a repeating group of

transactions. The automated evaluation of any query relating to
customers' transactions therefore would broadly involve two stages:
88
1. Unpacking one or more customers' groups of transactions
allowing the individual transactions in a group to be
examined, and
2. Deriving a query result based on the results of the first stage
For example, in order to find out the monetary sum of all

transactions that occurred in October 2003 for all customers, the
system would have to know that it must first unpack the
Transactions group of each customer, then sum the Amounts of all
transactions thus obtained where the Date of the transaction falls in
October 2003.
One of Codd's important insights was that this structural
complexity could always be removed completely, leading to much
greater power and flexibility in the way queries could be formulated
(by users and applications) and evaluated (by the DBMS). The
normalized equivalent of the structure above would look like this:
Customer Tr. ID Date
Amount
Jones
12890 14-Oct-2003 -87
Jones
12904 15-Oct-2003 -50
Wilkins
12898 14-Oct-2003 -21
Stevens 12907 15-Oct-2003 -18
Stevens 14920 20-Nov-2003 -70
Stevens 15003 27-Nov-2003 -60
Now each row represents an individual credit card
transaction, and the DBMS can obtain the answer of interest,
simply by finding all rows with a Date falling in October, and
summing their Amounts. All of the values in the data structure are
on an equal footing: they are all exposed to the DBMS directly, and
can directly participate in queries, whereas in the previous situation
some values were embedded in lower-level structures that had to
be handled specially. Accordingly, the normalized design lends
itself to general-purpose query processing, whereas the
unnormalized design does not.
The objectives of normalization beyond 1NF were stated as follows
by Codd:
1. To free the collection of relations from undesirable insertion,
update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations as
new types of data are introduced, and thus increase the life
span of application programs;
3. To make the relational model more informative to users;
89
4. To make the collection of relations neutral to the query statistics,

where these statistics are liable to change as time goes by.
E.F. Codd, "Further Normalization of the Data Base Relational
Model.
8.2 SECOND NORMAL FORM

Second normal form (2NF) is a normal form used in
database normalization. 2NF was originally defined by E.F. Codd in
1971. A table that is in first normal form (1NF) must meet additional
criteria if it is to qualify for second normal form. Specifically: a 1NF
table is in 2NF if and only if, given any candidate key K and any
attribute A that is not a constituent of a candidate key, A depends
upon the whole of K rather than just a part of it.
In slightly more formal terms: a 1NF table is in 2NF if and
only if all its non-prime attributes are functionally dependent on the
whole of a candidate key. (A non-prime attribute is one that does
not belong to any candidate key.)
Note that when a 1NF table has no composite candidate
keys (candidate keys consisting of more than one attribute), the
table is automatically in 2NF.
Consider a table describing employees' skills:
Employees' Skills
Employee Skill
Current Work Location
Jones
Typing
114 Main Street
Jones
Shorthand
114 Main Street
Jones
Whittling
114 Main Street
Bravo
Light Cleaning 73 Industrial Way
Ellis
Alchemy
73 Industrial Way
Ellis
Flying
73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
Neither {Employee} nor {Skill} is a candidate key for the
table. This is because a given Employee might need to appear
more than once (he might have multiple Skills), and a given Skill
might need to appear more than once (it might be possessed by
multiple Employees). Only the composite key {Employee, Skill}
qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is
dependent on only part of the candidate key, namely Employee.
Therefore the table is not in 2NF. Note the redundancy in the way
90
Current Work Locations are represented: we are told three times

that Jones works at 114 Main Street, and twice that Ellis works at
73 Industrial Way. This redundancy makes the table vulnerable to
update anomalies: it is, for example, possible to update Jones' work
location on his "Typing" and "Shorthand" records and not update
his "Whittling" record. The resulting data would imply contradictory
answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same
information in two tables: an "Employees" table with candidate key
{Employee}, and an "Employees' Skills" table with candidate key
{Employee, Skill}:
Employees
Employees' Skills
Employee Skill
Jones
Typing
Jones
Shorthand
Jones
Whittling
Light
Bravo
Cleaning
Ellis
Alchemy
Ellis
Flying
Light
Harrison
Cleaning
Current
Employee Work
Location
114 Main
Jones
Street
73
Bravo
Industrial
Way
73
Ellis
Industrial
Way
73
Harrison
Industrial Way
Neither of these tables can suffer from update anomalies.

Not all 2NF tables are free from update anomalies, however.
An example of a 2NF table which suffers from update anomalies is:
Tournament Winners
Tournament
Year Winner
Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open
1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are
determined by the whole key {Tournament / Year} and not part of it,
particular Winner / Winner Date of Birth combinations are shown
redundantly on multiple records. This leads to an update anomaly:
if updates are not carried out consistently, a particular winner could
be shown as having two different dates of birth.
91
The underlying problem is the transitive dependency to

which the Winner Date of Birth attribute is subject. Winner Date of
Birth actually depends on Winner, which in turn depends on the key
Tournament / Year.
This problem is addressed by third normal form (3NF).
8.3 THIRD NORMAL FORM
The third normal form (3NF) is a normal form used in
database normalization. 3NF was originally defined by E.F. Codd in
1971. Codd's definition states that a table is in 3NF if and only if
both of the following conditions hold:
The relation R (table) is in second normal form (2NF)

Every non-prime attribute of R is non-transitively dependent
(i.e. directly dependent) on every key of R.
A non-prime attribute of R is an attribute that does not belong

to any candidate key of R. A transitive dependency is a functional
dependency in which X Z (X determines Z) indirectly, by virtue of
X Y and Y Z (where it is not the case that Y X).
A 3NF definition that is equivalent to Codd's, but expressed
differently was given by Carlo Zaniolo in 1982. This definition states
that a table is in 3NF if and only if, for each of its functional
dependencies X A, at least one of the following conditions
holds:
X contains A (that is, X A is trivial functional dependency), or

X is a superkey, or
A is a prime attribute (i.e., A is contained within a candidate
key)
n example of a 2NF table that fails to meet the requirements of
3NF is:
Tournament Winners
Tournament
Year Winner
Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open
1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Because each row in the table needs to tell us who won a

particular Tournament in a particular Year, the composite key
{Tournament, Year} is a minimal set of attributes guaranteed to
uniquely identify a row. That is, {Tournament, Year} is a
candidate key for the table.
92
The breach of 3NF occurs because the non-prime attribute

Winner Date of Birth is transitively dependent on the candidate
key {Tournament, Year} via the non-prime attribute Winner. The
fact that Winner Date of Birth is functionally dependent on
Winner makes the table vulnerable to logical inconsistencies, as
there is nothing to stop the same person from being shown with
different dates of birth on different records.
In order to express the same facts without violating 3NF, it is

necessary to split the table into two:
Tournament Winners
Tournament Year Winner
Indiana
1998 Al Fredrickson
Invitational
Cleveland
1999 Bob Albertson
Open
Des Moines
1999 Al Fredrickson
Masters
Indiana
Chip
1999
Invitational
Masterson
Player Dates of Birth

Player
Date of Birth
Chip
Masterson
14 March 1977
Al Fredrickson 21 July 1975

Bob Albertson
28 September
1968
Update anomalies cannot occur in these tables, which are both

in 3NF.
8.4 FOURTH NORMAL FORM

Fourth normal form (4NF) is a normal form used in
database normalization. Introduced by Ronald Fagin in 1977, 4NF
is the next level of normalization after Boyce-Codd normal form
(BCNF). Whereas the second, third, and Boyce-Codd normal forms
are concerned with functional dependencies, 4NF is concerned with
a more general type of dependency known as a multivalued
dependency. A table is in 4NF if and only if, for every one of its
non-trivial multivalued dependencies X Y, X is a superkey
that is, X is either a candidate key or a superset thereof.
93
Consider the following example:

Pizza Delivery Permutations
Pizza Variety Delivery Area
Restaurant
A1 Pizza
Thick Crust Springfield
A1 Pizza
Thick Crust Shelbyville
A1 Pizza
Thick Crust Capital City
A1 Pizza
Stuffed Crust Springfield
A1 Pizza
Stuffed Crust Shelbyville
A1 Pizza
Stuffed Crust Capital City
Elite Pizza
Thin Crust
Capital City
Elite Pizza
Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust
Springfield
Vincenzo's Pizza Thin Crust
Shelbyville
Each row indicates that a given restaurant can deliver a given
variety of pizza to a given area.
The table has no non-key attributes because its only key is
{Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all
normal forms up to BCNF. If we assume, however, that pizza
varieties offered by a restaurant are not affected by delivery area,
then it does not meet 4NF. The problem is that the table features
two non-trivial multivalued dependencies on the {Restaurant}
attribute (which is not a superkey). The dependencies are:
{Restaurant} {Pizza Variety}

{Restaurant} {Delivery Area}
These non-trivial multivalued dependencies on a non-superkey

reflect the fact that the varieties of pizza a restaurant offers are
independent from the areas to which the restaurant delivers. This
state of affairs leads to redundancy in the table: for example, we
are told three times that A1 Pizza offers Stuffed Crust, and if A1
Pizza starts producing Cheese Crust pizzas then we will need to
add multiple rows, one for each of A1 Pizza's delivery areas. There
is, moreover, nothing to prevent us from doing this incorrectly: we
might add Cheese Crust rows for all but one of A1 Pizza's delivery
areas, thereby failing to respect the multivalued dependency
{Restaurant} {Pizza Variety}.
To eliminate the possibility of these anomalies, we must place
the facts about varieties offered into a different table from the facts
about delivery areas, yielding two tables that are both in 4NF:
94
Varieties By Restaurant
Pizza
Restaurant
Variety
A1 Pizza
Thick Crust
Stuffed
A1 Pizza
Crust
Elite Pizza Thin Crust
Stuffed
Elite Pizza
Crust
Vincenzo's
Thick Crust
Pizza
Vincenzo's
Thin Crust
Pizza
Delivery Areas By
Restaurant
Delivery
Restaurant
Area
A1 Pizza
Springfield
A1 Pizza
Shelbyville
Capital
A1 Pizza
City
Capital
Elite Pizza
City
Vincenzo's
Springfield
Pizza
Vincenzo's
Shelbyville
Pizza
In contrast, if the pizza varieties offered by a restaurant

sometimes did legitimately vary from one delivery area to another,
the original three-column table would satisfy 4NF.
8.5 FIFTH NORMAL FORM

Fifth normal form (5NF), also known as Project-join
normal form (PJ/NF) is a level of database normalization,
designed to reduce redundancy in relational databases recording
multi-valued facts by isolating semantically related multiple
relationships. A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate
key(s) of R if and only if each of A, B, , Z is a superkey for R
Consider the following example:
Travelling Salesman Product Availability By Brand
Travelling Salesman
Brand
Product Type
Acme
Vacuum Cleaner
Jack Schneider
Acme
Breadbox
Jack Schneider
Robusto Pruning Shears
Willy Loman
Robusto Vacuum Cleaner
Willy Loman
Robusto Breadbox
Willy Loman
Robusto Umbrella Stand
Willy Loman
Robusto Vacuum Cleaner
Louis Ferguson
Robusto Telescope
Louis Ferguson
Acme
Vacuum Cleaner
Louis Ferguson
Acme
Lava Lamp
Louis Ferguson
Nimbus
Tie Rack
Louis Ferguson
95
The table's predicate is: Products of the type designated by

Product Type, made by the brand designated by Brand, are
available from the travelling salesman designated by Travelling
Salesman.
In the absence of any rules restricting the valid possible
combinations of Travelling Salesman, Brand, and Product Type, the
three-attribute table above is necessary in order to model the
situation correctly.
Suppose, however, that the following rule applies: A
Travelling Salesman has certain Brands and certain Product Types
in his repertoire. If Brand B is in his repertoire, and Product Type P
is in his repertoire, then (assuming Brand B makes Product Type
P), the Travelling Salesman must offer products of Product Type P
made by Brand B.
In that case, it is possible to split the table into three:
Product Types By
Travelling Salesman
Travelling Product
Salesman Type
Jack
Vacuum
Schneider Cleaner
Jack
Breadbox
Schneider
Willy
Pruning
Loman
Shears
Willy
Vacuum
Loman
Cleaner
Willy
Breadbox
Loman
Willy
Umbrella
Loman
Stand
Louis
Telescope
Ferguson
Louis
Vacuum
Ferguson Cleaner
Louis
Lava
Ferguson Lamp
Louis
Tie Rack
Ferguson
Brands
By
Travelling Salesman
Travelling
Brand
Salesman
Jack
Acme
Schneider
Willy
Robusto
Loman
Louis
Robusto
Ferguson
Louis
Acme
Ferguson
Louis
Nimbus
Ferguson
Product Types By
Brand
Product
Brand
Type
Vacuum
Acme
Cleaner
Acme Breadbox
Lava
Acme
Lamp
Pruning
Robusto
Shears
Vacuum
Robusto
Cleaner
Robusto Breadbox
Umbrella
Robusto
Stand
Robusto Telescope
Nimbus Tie Rack
Note how this setup helps to remove redundancy.
96
8.6 BCNF
Boyce-Codd normal form (or BCNF or 3.5NF) is a normal
form used in database normalization. It is a slightly stronger version
of the third normal form (3NF). A table is in Boyce-Codd normal
form if and only if for every one of its non-trivial [dependencies] X
Y, X is a superkeythat is, X is either a candidate key or a
superset thereof.
Only in rare cases does a 3NF table not meet the
requirements of BCNF. A 3NF table which does not have multiple
overlapping candidate keys is guaranteed to be in BCNF.[4]
Depending on what its functional dependencies are, a 3NF table
with two or more overlapping candidate keys may or may not be in
BCNF.
An example of a 3NF table that does not meet BCNF is:
Today's Court Bookings
Court Start Time End Time Rate Type

1
09:30
10:30
SAVER
1
11:00
12:00
SAVER
1
14:00
15:30
STANDARD
2
10:00
11:30
PREMIUM-B
2
11:30
13:30
PREMIUM-B
2
15:00
16:30
PREMIUM-A
Each row in the table represents a court booking at a tennis club

that has one hard court (Court 1) and one grass court (Court 2)
A booking is defined by its Court and the period for which the
Court is reserved
Additionally, each booking has a Rate Type associated with it.

There are four distinct rate types:
SAVER, for Court 1 bookings made by members
STANDARD, for Court 1 bookings made by non-members
PREMIUM-A, for Court 2 bookings made by members
PREMIUM-B, for Court 2 bookings made by non-members
The table's candidate keys are:

{Court, Start Time}
{Court, End Time}
{Rate Type, Start Time}
{Rate Type, End Time}
97
Recall that 2NF prohibits partial functional dependencies of

non-prime attributes on candidate keys, and that 3NF prohibits
transitive functional dependencies of non-prime attributes on
candidate keys. In the Today's Court Bookings table, there are no
non-prime attributes: that is, all attributes belong to candidate keys.
Therefore the table adheres to both 2NF and 3NF.
The table does not adhere to BCNF. This is because of the
dependency Rate Type Court, in which the determining attribute
(Rate Type) is neither a candidate key nor a superset of a
candidate key.
Any table that falls short of BCNF will be vulnerable to logical
inconsistencies. In this example, enforcing the candidate keys will
not ensure that the dependency Rate Type Court is respected.
There is, for instance, nothing to stop us from assigning a
PREMIUM A Rate Type to a Court 1 booking as well as a Court 2
bookinga clear contradiction, as a Rate Type should only ever
apply to a single Court.
The design can be amended so that it meets BCNF:
Today's Bookings
Start End Member
Member Court Time Time Flag
Rate Type Court
Flag
1
09:30 10:30 Yes
SAVER
1
Yes
1
11:00 12:00 Yes
STANDARD 1
No
1
14:00 15:30 No
PREMIUM2
10:00 11:30 No
2
Yes
A
2
11:30 13:30 No
PREMIUM2
No
2
15:00 16:30 Yes
B
Rate Types
The candidate keys for the Rate Types table are {Rate Type}
and {Court, Member Flag}; the candidate keys for the Today's
Bookings table are {Court, Start Time} and {Court, End Time}. Both
tables are in BCNF. Having one Rate Type associated with two
different Courts is now impossible, so the anomaly affecting the
original table has been eliminated.
8.7 COMPARISON OF 3NF AND BCNF

Boyce Codd normal form (also known as BCNF) is a normal
form that is a form that provides criteria for determining a tables
degree of vulnerability to logical inconsistencies and anomalies.
This normal form is used in database normalisation. It is a bit
stronger than its predecessor, the third normal form (also known as
3NF). A table is thought to be in BCNF if and only if for every one if
98
its non-trivial functional dependencies that is a boundary that is

set between two sets of attributes in a relation taken from a
database is a superkey (a set of attributes of a relational variable
that postulates that in all relations assigned to that specific variable
there are no two distinct rows containing the same value for the
attributes in that particular set). BCNF postulates that any table that
fails to meet the criteria to be attributed as a BNCF is vulnerable to
logical inconsistencies.
3NF is a normal form that is also used in database
normalisation. It is thought that a table is in 3NF if and only if 1) the
table is in second normal form (or 2NF, which is a first normal code,
or 1NF, that has met the criteria to become a 2NF), and 2) every
non-prime attribute of the table is non-transitively dependent on
every key of the table (meaning it is not directly dependent on every
key). There is another postulation of 3NF that is also used to define
the differences between 3NF and the BCNF.
This theorem was conceived by Carlo Zaniolo in 1982. It
states that a table is in 3NF if and only if for each functional
dependency where X A, at least one of three conditions must
hold: either X A, X is a superkey, or A is a prime attribute (which
means A is contained within a candidate key or a minimal
superkey for that relation). This newer definition differs from the
theorem of a BCNF in that the latter model would simply eliminate
the last condition. Even as it acts as a newer version of the 3NF
theorem, there is a derivation of the Zaniolo theorem. It states that
X A is non-trivial. If that is true, let A be a noon-key attribute and
also let Y be a key of R. If that holds then Y X. This means that
A is not transitively dependent on Y if and only if X Y (or if X is a
superkey.
Summary:
BCNF is a normal form in which for every one of a tables
non-trivial functional dependencies, is a superkey; 3NF is normal
form in which the table is in 2NF and every non-prime attribute is
non-transitively dependent on every key in the table.
8.8 LOSSLESS AND DEPENDENCY PRESERVING

DECOMPOSITION
8.8.1 Lossless-Join Decomposition
Let R be a relation schema and let F be H, set of FDs over
R. A decomposition of R into two schemas with attribute sets X and
Y is said to be a lossless-join decomposition with respect to F if, for
every instance T of R that satisfies the dependencies in
. In other words, we can recover the
original relation from the decomposed relations.
99
This definition can easily be extended to cover a

decomposition of Ii into In more than two relations. It is easy to see
that
always holds.
In general, though, the other direction does not hold. If we

take projections of a relation and recombine them using natural
join, We typically obtain the tuples that were not in the original
relation.
8.9 CLOSURE OF DEPENDENCIES

1. We need to consider all functional dependencies that hold. Given
a set F of functional dependencies, we can prove that certain other
ones also hold. We say these ones are logically implied by F.
2. Suppose we are given a relation scheme R = (A, B, C, G, H, I),
and the set of functional dependencies:
AB
AC
CG H
CG I
BH
Then the functional dependency A H is logically implied.
3. To see why, let t1 and t2 be tuples such that t1[A] = t2[A]
As we are given A B , it follows that we must also have t1[B]
= t2[B]
Further, since we also have B H , we must also have t1[H] =
t2[H]
Thus, whenever two tuples have the same value on A, they
must also have the same value on H, and we can say that A
H.
4. The closure of a set F of functional dependencies is the set of all
functional dependencies logically implied by F.
5. We denote the closure of F by F+.
6. To compute F+, we can use some rules of inference called
Armstrong's Axioms:
Reexivity rule: if is a set of attributes and , then holds.
Augmentation rule: if holds, and is a set of attributes,
then holds.
Transitivity rule: if holds, and holds, then
holds.
100
7. These rules are sound because they do not generate any

incorrect functional dependencies. They are also complete as they
generate all of F+.
8. To make life easier we can use some additional rules, derivable
from Armstrong's Axioms:
Union rule: if and , then holds.

Decomposition rule: if holds, then and both
hold.
Pseudotransitivity rule: if holds, and holds,
then holds.
9. Applying these rules to the scheme and set F mentioned above,

we can derive the following:
A H, as we saw by the transitivity rule.
CG HI by the union rule.
AG I by several steps:
- Note that A C holds.
- Then AG CG, by the augmentation rule.
- Now by transitivity, AG I.
8.10 MINIMAL CLOSURE (COVER)

A minimal cover for a set F of FDs is a set G of FDs such that:
Intuitively, a minimal cover for a set }-' of FDs is an equivalent set of

dependencies that is minimal in two respects: (1) Every
dependency is as small as possible; that its each attribute on the
left side is necessary and the right side is a single attribute. (2)
Every dependency in it is required for the closure to be equal to F+.
As an example, let F be the set of dependencies:
First let us rewrite it:

single attribute:
so that every right side is a
Next consider ACDF -> G, This dependency is implied by the

following FDs:
101
Therefore, we can delete it, Similarly, we can delete

. Next consider
, Since
holds, we
, (At this point, the reader should
can replace it with
verify that each remaining FD is minimal and required,) Thus, a
minimal cover for F is the set:
Summary of Schema Refinement
If a relation is in BCNF, it is free of redundancies that can be

detected using FDs. Thus, trying to ensure that all relations are
in BCNF is a good heuristic.
If a relation is not in BCNF, we can try to decompose it into a

collection of BCNF relations.
Must consider whether all FDs are preserved. If a lossless-join,

dependency preserving decomposition into BCNF is not
possible (or unsuitable, given typical queries), should consider
decomposition into 3NF.
Decompositions should be carried out and/or re-examined while

keeping performance requirements in mind.
102
9
TRANSACTION PROCESSING
Topics covered
9.1 Transaction Concurrency Control
9.2 Recovery of Transaction Failure
9.3 Serializability
9.4 LOG based recovery
9.5 Locking Techniques
9.6 Granularity in locks
9.7 Time Stamping techniques
9.8 Two phase locking system
9.9 Deadlock handling
9.1 TRANSACTION CONCURRENCY CONTROL

In computer science, especially in the fields of computer
programming (see also concurrent programming, parallel
programming), operating systems (see also parallel computing),
multiprocessors, and databases, concurrency control ensures
that correct results for concurrent operations are generated, while
getting those results as quickly as possible. Computer systems,
both software and hardware, consist of modules, or components.
Each component is designed to meet certain consistency rules.
When components that operate concurrently interact by messaging
or by sharing accessed data (in memory or storage), a certain
component's consistency may be violated by another component.
The general area of concurrency control provides rules, methods,
and design methodologies to maintain the consistency of
components operating concurrently while interacting, and thus the
consistency and correctness of the whole system. Introducing
concurrency control into a system means applying operation
constraints with related performance overhead. Operation
consistency and correctness should be achieved together with
operation efficiency.
Concurrency control in database management systems
(DBMS), other transactional objects (objects with states accessed
and modified by database transactions), and related distributed
applications (e.g., Grid computing and Cloud computing) ensures
that database transactions are performed concurrently without the
103
concurrency violating the data integrity of the respective databases.

Thus concurrency control is an essential component for correctness
in any system where two database transactions or more can
access the same data concurrently, e.g., virtually in any generalpurpose database system. A database transaction is defined as an
object that meets the ACID rules described below when executed.
A DBMS usually guarantees that only serializable transaction
schedules (i.e., schedules that are equivalent to serial schedules,
where transactions are executed serially, one after another, with no
overlap in time; have the serializability property) are generated, for
correctness (unless Serializability is intentionally relaxed). For
maintaining correctness in cases of failed transactions (which can
always happen) schedules also need to be recoverable (have the
recoverability property). A DBMS also guarantees that no effect of
committed transactions is lost, and no effect of aborted (rolled
back) transactions remains in the related database (maintains
transactions' atomicity, i.e., "all or nothing" semantics).
9.2 RECOVERY OF TRANSACTION FAILURE

Process of restoring database to a correct state in the event of a
failure.
Need for Recovery Control
Two types of storage: volatile (main memory) and nonvolatile.
Volatile storage does not survive system crashes.
Stable storage represents information that has been replicated in
several nonvolatile storage media with independent failure modes.
9.2.1 Transaction and recovery:
Transactions represent basic unit of recovery.
Recovery manager responsible for atomicity and durability.
If failure occurs between commit and database buffers being
flushed to secondary storage then, to ensure durability, recovery
manager has to redo (roll forward) transaction's updates.
If transaction had not committed at failure time, recovery manager
has to undo (rollback) any effects of that transaction for atomicity.
Partial undo - only one transaction has to be undone.
Global undo - all transactions have to be undone.
9.2.1.1 DBMS should provide following facilities to assist with
recovery:
Backup mechanism, which makes periodic backup copies of
database.
104
Logging facilities, which keep track of current state of transactions

and database changes.
Checkpoint facility, which enables updates to database in
progress to be made permanent.
Recovery manager, which allows DBMS to restore the database
to a consistent state following a failure.
9.2.2 Three main recovery techniques:
If database has been damaged we need to restore last
backup copy of database and reapply updates of committed
transactions using log file.
If database is only inconsistent we need to undo changes
that caused inconsistency. May also need to redo some
transactions to ensure updates reach secondary storage.
The Recovery techniques are as follows:
Deferred Update
Updates are not written to the database until after a
transaction has reached its commit point.
If transaction fails before commit, it will not have modified
database and so no undoing of changes required.
May be necessary to redo updates of committed
transactions as their effect may not have reached database.
Immediate Update
Updates are applied to database as they occur.
Need to redo updates of committed transactions following a
failure.
May need to undo effects of transactions that had not
committed at time of failure.
Essential that log records are written before write to
database. Write-ahead log protocol.
If no "transaction commit" record in log, then that
transaction was active at failure and must be undone.
Undo operations are performed in reverse order in which
they were written to log.
Shadow Paging.
Maintain two page tables during life of a transaction:
current page and shadow page table.
When transaction starts, two pages are the same.
105
Shadow page table is never changed thereafter and is

used to restore database in event of failure.
During transaction, current page table records all updates
to database.
When transaction completes, current page table becomes
shadow page table.
9.3 SERIALIZABILITY
In concurrency control of databases, transaction processing
(transaction management), and various transactional applications,
both centralized and distributed, a transaction schedule is
serializable, has the Serializability property, if its outcome (the
resulting database state, the values of the database's data) is equal
to the outcome of its transactions executed serially, i.e.,
sequentially without overlapping in time. Transactions are normally
executed concurrently (they overlap), since this is the most efficient
way. Serializability is the major correctness criterion for concurrent
transactions' executions. It is considered the highest level of
isolation between transactions, and plays an essential role in
concurrency control. As such it is supported in all general purpose
database systems.
The rationale behind serializability is the following:
If each transaction is correct by itself, i.e., meets certain
integrity conditions, then a schedule that comprises any serial
execution of these transactions is correct (its transactions still meet
their conditions): "Serial" means that transactions do not overlap in
time and cannot interfere with each other, i.e., complete isolation
between each other exists. Any order of the transactions is
legitimate, if no dependencies among them exist, which is assumed
(see comment below). As a result, a schedule that comprises any
execution (not necessarily serial) that is equivalent (in its outcome)
to any serial execution of these transactions, is correct.
Schedules that are not serializable are likely to generate
erroneous outcomes. Well known examples are with transactions
that debit and credit accounts with money: If the related schedules
are not serializable, then the total sum of money may not be
preserved. Money could disappear, or be generated from nowhere.
This and violations of possibly needed other invariant preservations
are caused by one transaction writing, and "stepping on" and
erasing what has been written by another transaction before it has
become permanent in the database. It does not happen if
serializability is maintained.
106
9.4 LOG BASED RECOVERY

The commit or rollback that the DBMS performs is with the
help of a file which keeps track of old and new records known as
transaction log. All the DBMS brands make use of transaction log.
When a user executes a SQL statement that modifies the
database, the DBMS automatically writes a record in the
transaction log showing two copies of each row affected by the
statement; 1) one copy shows the row before the change and the
other copy shows the row after the change.2) only after the log file
is written does the DBMS automatically modifies the row on the
disk.3)If commit occurs the new record is chosen by the log file or
else if rollback occurs, the old record is chosen.
If a system failure occurs, the system operator typically
recovers the data by running a special recovery utility supplied with
the DBMS. The recovery utility examines the end of the transaction
log, looking for the transactions that were not committed before
failure. The utility rolls back each of these incomplete transactions
so that only committed transactions are reflected in the data base.
9.5 LOCKING TECHNIQUES

Virtually all major DBMS products use sophisticated locking
techniques to handle concurrent SQL transactions for many
simultaneous users.
When transaction A accesses the database, the DBMS
automatically locks each piece of the database that the transaction
retrieves or modifies. Transaction B proceeds in parallel, and the
DBMS also locks the pieces of the database that it accesses. If
Transaction B tries to access part of the database that has been
locked by Transaction A, the DBMS blocks Transaction B, causing
it to wait for the data to be unlocked. The DBMS releases the locks
held by Transaction A only when it ends in a COMMIT or
ROLLBACK operation. The DBMS then "unblocks" Transaction B,
allowing it to proceed. Transaction B can now lock that piece of the
database on its own behalf, protecting it from the effects of other
transactions.
9.5.1 Shared and Exclusive Locks
To increase concurrent access to a database, most
commercial DBMS products use a locking scheme with more than
one type of lock. A scheme using shared and exclusive locks is
quite common:
A shared lock is used by the DBMS when a transaction wants to
read data from the database. Another concurrent transaction can
107
also acquire a shared lock on the same data, allowing the other
transaction to also read the data.
An exclusive lock is used by the DBMS when a transaction wants
to update data in the database. When a transaction has an
exclusive lock on some data, other transactions cannot acquire any
type of lock (shared or exclusive) on the data.
The locking technique temporarily gives a transaction
exclusive access to a piece of a database, preventing other
transactions from modifying the locked data. Locking thus solves all
of the concurrent transaction problems. It prevents lost updates,
uncommitted data, and inconsistent data from corrupting the
database. However, locking introduces a new problemit may
cause a transaction to wait for a long time while the pieces of the
database that it wants to access are locked by other transactions.
9.5.2 Locking parameters
Typical parameters are as follows:
Lock size. Some DBMS products offer a choice of table-level,
page-level, row-level, and other lock sizes. Depending on the
specific application, a different size lock may be appropriate.
Number of locks. A DBMS typically allows each transaction to
have some finite number of locks. The database administrator can
often set this limit, raising it to permit more complex transactions or
lowering it to encourage earlier lock escalation.
Lock escalation. A DBMS will often automatically "escalate"
locks, replacing many small locks with a single larger lock (for
example, replacing many page-level locks with a table-level lock).
The database administrator may have some control over this
escalation process.
Lock timeout. Even when a transaction is not deadlocked with
another transaction, it may wait a very long time for the other
transaction to release its locks. Some DBMS brands implement a
timeout feature, where a SQL statement fails with a SQL error code
if it cannot obtain the locks it needs within a certain period of time.
The timeout period can usually be set by the database
administrator.
9.6 GRANULARITY IN LOCKS

An important property of a lock is its granularity. The
granularity is a measure of the amount of data the lock is
protecting. In general, choosing a coarse granularity (a small
108
number of locks, each protecting a large segment of data) results in

less lock overhead when a single process is accessing the
protected data, but worse performance when multiple processes
are running concurrently. This is because of increased lock
contention. The more coarse the lock, the higher the likelihood
that the lock will stop an unrelated process from proceeding.
Conversely, using a fine granularity (a larger number of locks, each
protecting a fairly small amount of data) increases the overhead of
the locks themselves but reduces lock contention. More locks also
increase the risk of deadlock.
9.6.1 Levels of locking
Database level locking:
Locking can be implemented at various levels of the
database. In its crudest form, the DBMS could lock the entire
database for each transaction. This locking strategy would
be simple to implement, but it would allow processing of only
one transaction at a time. Table level locking:
In this scheme, the DBMS locks only the tables accessed by
a transaction. Other transactions can concurrently access
other tables. This technique permits more parallel
processing, but still leads to unacceptably slow performance
in applications such as order entry, where many users must
share access to the same table or tables.
Page level locking:
Many DBMS products implement locking at the page level.
In this scheme, the DBMS locks individual blocks of data
("pages") from the disk as they are accessed by a
transaction. Other transactions are prevented from
accessing the locked pages but may access (and lock for
themselves) other pages of data. Page sizes of 2KB, 4KB,
and 16KB are commonly used. Since a large table will be
spread out over hundreds or thousands of pages, two
transactions trying to access two different rows of a table will
usually be accessing two different pages, allowing the two
transactions to proceed in parallel.
Row level locking:
Over the last several years, most of the major commercial
DBMS systems have moved beyond page-level locking to
row-level locks. Row-level locking allows two concurrent
transactions that access two different rows of a table to
proceed in parallel, even if the two rows fall in the same disk
block. While this may seem a remote possibility, it can be a
real problem with small tables containing small records.
Row-level locking provides a high degree of parallel
transaction execution. Unfortunately, keeping track of locks
109
on variable-length pieces of the database (in other words,
rows) rather than fixed-size pages is a much more complex
task, so increased parallelism comes at the cost of more
sophisticated locking logic and increased overhead.
9.7 TIME STAMPING TECHNIQUES

A trusted timestamp is a timestamp issued by a trusted third
party (TTP) acting as a time stamping authority (TSA). It is used
to prove the existence of certain data before a certain point (e.g.
contracts, research data, medical records,...) without the possibility
that the owner can backdate the timestamps. Multiple TSAs can be
used to increase reliability and reduce vulnerability.
9.7.1 Creating a timestamp
The technique is based on digital signatures and hash
functions. First a hash is calculated from the data. A hash is a sort
of digital fingerprint of the original data: a string of bits that is
different for each set of data. If the original data is changed then
this will result in a completely different hash. This hash is sent to
the TSA. The TSA concatenates a timestamp to the hash and
calculates the hash of this concatenation. This hash is in turn
digitally signed with the private key of the TSA. This signed hash +
the timestamp is sent back to the requester of the timestamp who
stores these with the original data (see diagram).
Since the original data can not be calculated from the hash
(because the hash function is a one way function), the TSA never
gets to see the original data, which allows the use of this method
for confidential data.
Figure: Getting a timestamp from a trusted third party.

9.7.2 Checking the timestamp
Anyone trusting the timestamper can then verify that the
document was not created after the date that the timestamper
vouches. It can also no longer be repudiated that the requester of
the timestamp was in possession of the original data at the time
110
given by the timestamp. To prove this (see diagram) the hash of the
original data is calculated, the timestamp given by the TSA is
appended to it and the hash of the result of this concatenation is
calculated, call this hash A.
Then the digital signature of the TSA needs to be validated.
This can be done by checking that the signed hash provided by the
TSA was indeed signed with their private key by digital signature
verification. The hash A is compared with the hash B inside the
signed TSA message to confirm they are equal, proving that the
timestamp and message is unaltered and was issued by the TSA. If
not, then either the timestamp was altered or the timestamp was
not issued by the TSA.
Figure: Checking correctness of a timestamp generated by a

time stamping authority (TSA).
9.8 TWO PHASE LOCKING SYSTEM:

According to the two-phase locking protocol, a transaction
handles its locks in two distinct, consecutive phases during the
transaction's execution:
1. Expanding phase (number of locks can only increase): locks
are acquired and no locks are released.
2. Shrinking phase: locks are released and no locks are acquired.
The serializability property is guaranteed for a schedule with
transactions that obey the protocol. The 2PL schedule class is
defined as the class of all the schedules comprising transactions
with data access orders that could be generated by the 2PL
protocol.
Typically, without explicit knowledge in a transaction on end of
phase-1, it is safely determined only when a transaction has
entered its ready state in all its processes (processing has ended,
and it is ready to be committed; no additional locking is possible). In
111
this case phase-2 can end immediately (no additional processing is

needed), and actually no phase-2 is needed. Also, if several
processes (two or more) are involved, then a synchronization point
(similar to atomic commitment) among them is needed to determine
end of phase-1 for all of them (i.e., in the entire distributed
transaction), to start releasing locks in phase-2 (otherwise it is very
likely that both 2PL and Serializability are quickly violated). Such
synchronization point is usually too costly (involving a distributed
protocol similar to atomic commitment), and end of phase-1 is
usually postponed to be merged with transaction end (atomic
commitment protocol for a multi-process transaction), and again
phase-2 is not needed. This turns 2PL to SS2PL (see below). All
known implementations of 2PL in products are SS2PL based.
9.8.1 Strict two-phase locking
The strict two-phase locking (S2PL) class of schedules is
the intersection of the 2PL class with the class of schedules
possessing the Strictness property.
To comply with the S2PL protocol a transaction needs to comply
with 2PL, and release its write (exclusive) locks only after it has
ended, i.e., being either committed or aborted. On the other hand,
read (shared) locks are released regularly during phase 2.
Implementing general S2PL requires explicit support of phase-1
end, separate from transaction end, and no such widely utilized
product implementation is known.
S2PL is a special case of 2PL, i.e., the S2PL class is a proper
subclass of 2PL.
9.8.2 Strong strict two-phase locking
or Rigorousness, or Rigorous scheduling, or Rigorous twophase locking
To comply with strong strict two-phase locking (SS2PL)
the locking protocol releases both write (exclusive) and read
(shared) locks applied by a transaction only after the transaction
has ended, i.e., being either committed or aborted. This protocol
also complies with the S2PL rules. A transaction obeying SS2PL
can be viewed as having phase-1 that lasts the transaction's entire
execution duration, and no phase-2 (or a degenerate phase-2).
Thus, only one phase is actually left, and "two-phase" in the name
seems to be still utilized due to the historical development of the
concept from 2PL, and 2PL being a super-class. The SS2PL
property of a schedule is also called Rigorousness. It is also the
name of the class of schedules having this property, and an SS2PL
schedule is also called a "rigorous schedule". The term
"Rigorousness" is free of the unnecessary legacy of "two-phase,"
as well as being independent of any (locking) mechanism (in
112
principle other blocking mechanisms can be utilized). The

property's respective locking mechanism is sometimes referred to
as Rigorous 2PL.
9.9 DEADLOCK HANDLING

A deadlock is a situation wherein two or more competing
actions are each waiting for the other to finish, and thus neither
ever does.
An example of a deadlock which may occur in database
products is the following. Client applications using the database
may require exclusive access to a table, and in order to gain
exclusive access they ask for a lock. If one client application holds
a lock on a table and attempts to obtain the lock on a second table
that is already held by a second client application, this may lead to
deadlock if the second application then attempts to obtain the lock
that is held by the first application. (But this particular type of
deadlock is easily prevented, e.g., by using an all-or-none resource
allocation algorithm.)
There are four necessary and sufficient conditions for a
Coffman deadlock to occur, known as the Coffman conditions from
their first description in a 1971 article by E. G. Coffman.
1. Mutual exclusion condition: a resource that cannot be used
by more than one process at a time
2. Hold and wait condition: processes
resources may request new resources
already
holding
3. No preemption condition: No resource can be forcibly

removed from a process holding it, resources can be
released only by the explicit action of the process
4. Circular wait condition: two or more processes form a
circular chain where each process waits for a resource that
the next process in the chain holds
9.9.1 Prevention
Removing the mutual exclusion condition means that no
process may have exclusive access to a resource. This proves
impossible for resources that cannot be spooled, and even with
spooled resources deadlock could still occur. Algorithms that
avoid mutual exclusion are called non-blocking synchronization
algorithms.
The "hold and wait" conditions may be removed by requiring

processes to request all the resources they will need before
starting up (or before embarking upon a particular set of
operations); this advance knowledge is frequently difficult to
113
satisfy and, in any case, is an inefficient use of resources.
Another way is to require processes to release all their
resources before requesting all the resources they will need.
This too is often impractical. (Such algorithms, such as
serializing tokens, are known as the all-or-none algorithms.)
A "no preemption" (lockout) condition may also be difficult or

impossible to avoid as a process has to be able to have a
resource for a certain amount of time, or the processing
outcome may be inconsistent or thrashing may occur. However,
inability to enforce preemption may interfere with a priority
algorithm. (Note: Preemption of a "locked out" resource
generally implies a rollback, and is to be avoided, since it is very
costly in overhead.) Algorithms that allow preemption include
lock-free and wait-free algorithms and optimistic concurrency
control.
The circular wait condition: Algorithms that avoid circular waits

include "disable interrupts during critical sections", and "use a
hierarchy to determine a partial ordering of resources" (where
no obvious hierarchy exists, even the memory address of
resources has been used to determine ordering) and Dijkstra's
solution.
9.9.2 Circular wait prevention

Circular wait prevention consists of allowing processes to
wait for resources, but ensure that the waiting can't be circular. One
approach might be to assign a precedence to each resource and
force processes to request resources in order of increasing
precedence. That is to say that if a process holds some resources
and the highest precedence of these resources is m, then this
process cannot request any resource with precedence smaller than
m. This forces resource allocation to follow a particular and noncircular ordering, so circular wait cannot occur. Another approach is
to allow holding only one resource per process; if a process
requests another resource, it must first free the one it is currently
holding (that is, disallow hold-and-wait).
9.9.3 Avoidance
Deadlock can be avoided if certain information about
processes is available in advance of resource allocation. For every
resource request, the system sees if granting the request will mean
that the system will enter an unsafe state, meaning a state that
could result in deadlock. The system then only grants requests that
will lead to safe states. In order for the system to be able to figure
out whether the next state will be safe or unsafe, it must know in
advance at any time the number and type of all resources in
existence, available, and requested. One known algorithm that is
used for deadlock avoidance is the Banker's algorithm, which
114
requires resource usage limit to be known in advance. However, for

many systems it is impossible to know in advance what every
process will request. This means that deadlock avoidance is often
impossible.
Two other algorithms are Wait/Die and Wound/Wait, each of
which uses a symmetry-breaking technique. In both these
algorithms there exists an older process (O) and a younger process
(Y). Process age can be determined by a timestamp at process
creation time. Smaller time stamps are older processes, while
larger timestamps represent younger processes.
O needs a resource held by Y

Y needs a resource held by O
Wait/Die
O waits
Y dies
Wound/Wait
Y dies
Y waits
It is important to note that a process may be in an unsafe

state but would not result in a deadlock. The notion of safe/unsafe
states only refers to the ability of the system to enter a deadlock
state or not. For example, if a process requests A which would
result in an unsafe state, but releases B which would prevent
circular wait, then the state is unsafe but the system is not in
deadlock.
9.9.4 Detection
Often, neither avoidance nor deadlock prevention may be
used. Instead deadlock detection and process restart are used by
employing an algorithm that tracks resource allocation and process
states, and rolls back and restarts one or more of the processes in
order to remove the deadlock. Detecting a deadlock that has
already occurred is easily possible since the resources that each
process has locked and/or currently requested are known to the
resource scheduler or OS.
Detecting the possibility of a deadlock before it occurs is
much more difficult and is, in fact, generally undecidable, because
the halting problem can be rephrased as a deadlock scenario.
However, in specific environments, using specific means of locking
resources, deadlock detection may be decidable. In the general
case, it is not possible to distinguish between algorithms that are
merely waiting for a very unlikely set of circumstances to occur and
algorithms that will never finish because of deadlock.
Deadlock detection techniques include, but are not limited to,
Model checking. This approach constructs a Finite State-model on
which it performs a progress analysis and finds all possible terminal
sets in the model. These then each represent a deadlock.
115
10
SECURITY AND AUTHORIZATION
Topics covered
10.1 GRANTING PRIVILEGES: [GRANT STATEMENT]
10.2 REVOKING PREVILEGES [REVOKE STATEMENT]
10.3 PASSING PRIVILEGES: - (Using Grant option)
Implementing a security scheme and enforcing security
restrictions are the responsibilities of the dbms software.
The actions that a user is allowed permitted to carry out for a
given database object (i.e. forms, application programs, tables or
entire database) are called privilege. Users may have permission to
insert and select a row in a certain table but may lack permission to
delete or update rows of that table.
To establish a security scheme on a database, you use the
SQL GRANT and REVOKE statements.
10.1 GRANTING PRIVILEGES: [GRANT STATEMENT]:

The GRANT statement is used to grant the privileges on the
database objects to specific users. Normally the GRANT statement
is used by owner of the table or view to give other users access to
the data. The GRANT statement includes list of the privileges to be
granted, name of the table to which privileges apply and user id to
which privileges are granted.
E.g. 1) Give user ABC full access to employee table:
GRANT Select, Insert, Delete, update on employee to ABC
2) Let user PQR only read the employee table. Update, delete and
insert are not allowed.
GRANT Select on employee to PQR
3) Give all users select access to employee table:
GRANT Select on employee to public.
Note that GRANT statement in the above example grants
access to all present and future authorized users. This eliminates
the need for you to explicitly grant privileges to new users as they
are authorized.
116
10.2 REVOKING PREVILEGES [REVOKE STATEMENT]:

In most SQL based databases, the privileges that you have
granted with the GRANT statement can be taken away with the
REVOKE statement. The structure of the REVOKE statement is
much similar to that of the GRANT statement. A REVOKE
statement may take away all or some of the privileges granted to a
user id.
E.g. Revoke Select, Insert on employee from ABC.
10.3 PASSING PRIVILEGES: - (Using Grant option)

When you create a data base object and become its owner,
you are the only person who can grant privileges to use these
objects. When you grant privileges to other users, they are allowed
to use that object but can not pass those privileges on to other
users. In this way, the owner of the object maintains very tight
control, both over who has permission to use the object and over
which form of access are allowed. Occasionally you may want to
allow other users to grant privileges on an object that you own.
Ex: Consider user XYZ wants user ABC to grant privileges
on emp table to any other user. So he would Grant Select on emp
to ABC With Grant option
The key word with grant opt enables user ABC to grant
privilege on emp table to any other user, though he owns the table.
so as ABC would to user PQR
Grant Select on XYZ-emp to PQR
10.3.1REVOKE AND GRANT OPTION:When you grant privileges with the GRANT OPTON and
later revoke these privileges, most DBMS brands will automatically
revoke all privileges derived from the original grant. Consider again
the chain of privileges ABC to PQR and then PQR to XYZ. If ABC
revokes PQRs privileges, then XYZs privileges will also be
revoked. The situation gets more complicated if two or more users
have granted privileges and one of them later revokes the
privileges.
Consider that PQR gets Grant permission from both ABC
and XYZ and then grants privileges to MNO. When ABC revokes
permission from PQR, then grant from XYZ will remain. Further
MNO privileges will also remain because they can be derived from
XYZ.
Clinical Informatics Academy
Overview of Computer Hardware:

Introduction to Basic Terms &
Concepts
Fusheng Wang PhD
David A Gutman MD PHD
Goals of Lecture
Provide very brief overview of certain terms and
concepts that may be used throughout the day
Begin our ascent up the tower of Babel of Informatics
CPUs
Serves as the basic computation engine (brain) for
the machine
Intel and AMD make processors used in most
healthcare applications
Many machines now have multi-core, meaning a
single chip contains several individual CPUs
More cores != faster performance unless underlying
programs can work in parallel
Serial vs Parallel Operations

Have a 1 million line document and want to count #
of times diabetes appear
Can start from the top. and have the computer
keep going (serial operation)
Parallel version: Split document into equal parts,
and have each core or node count up their part
and assemble the results
Some problems are easier to parallelize than others
e.g. run same analysis on different patients
GPU
Graphics Processing Unit

Specialized piece of hardware that is very good at
processing certain types (generally 3D image) data
Examples in clinical practice would be GPUaccelerated DICOM workstation for
rendering/viewing 3-D image reconstructions
Networking
Computers are now networked together to allow
sharing of resources (printers/disks/etc)
Wired vs wireless
Networks can span different scopes: home office
department University (intranet) the world
(Internet)
Amount of information that can be transmitted
(bandwidth) can vary dramatically
Slower/congested networks = longer times to load
files particularly noticeable when transferring
imaging data (e.g. radiology)
RAM Memory vs Hard Drive

Random Access Memory(RAM)- Serves as short term storage
for calculations & programs
Most RAM is volatile turning off computer deletes this data
More RAM generally means better performance, especially
with large data sets/many programs open
Hard Disk generally spinning platters that store information
at high density non-volatile
Access time for RAM is generally 10x to 100xs of times faster
than corresponding disk access allowing processors to not
wait for information
vs
vs
Storage Types & Networking

Local vs network storage

Local storage is physically part of the computer
generally faster than network storage
Network/shared storage: Use existing network
infrastructure and files live in a different physical
location but can be directly accessed
Has many advantages as backups/redundancy can
be engineered as part of the system
Disadvantage is that slow networks and/or network
outages = no access to files
Operating Systems
Software framework that manages/controls basic
operations on a given CPU allowing communication
between users and underlying hardware
Examples: Windows 7, Mac OSX, Unix/Linux and
derivatives
Programs written for a given OS generally can only
be run on that platform
Linux/Unix is sometimes used for specific high-end
uses (like Radiology workstation)
DISK Storage
Serve as main storage for
files/programs/pictures/images/etc
Much slower than memory
New FLASH drives use non-volatile RAM similar to
whats used in a camera and can be used to store
frequently used data and allow quicker access than
conventional spinning disks
Network
General infrastructure that allows computers to
send information to and from each other
Can have different scopes/purposes
Bluetooth network: Short range communication
Intranet:
Firewalls
A firewall is a device or set of devices designed to
permit or deny network transmissions based upon a set
of rules and is frequently used to protect networks from
unauthorized access while permitting legitimate
communications to pass
Can be lax to very restrictive (block access to the entire
web)
Mandatory in clinical settings to protect patient data
Can allow access to certain resources
only at specific locations
Why firewalls can be important!
Looks like someone from Europe is trying

to access one of my servers by doing a
scan
Virtualization/Virtual Machines
Separates the applications/operating system from the
underlying hardware and creates a virtual copy
This virtual machine can be transferred to any
computer/hardware that can host the image in case of
hardware failure
Copies/snapshots can be made of the image to
facilitate backup/rollbacks/testing
Allows pooling of resources a single machine can
host several virtual hosts
Performance of a virtual machine is no longer
significantly slower than a real machine for many
applications
Relatively inexpensive machines can be used as a
thin client to access a VM
Examples:
Remote Desktop to my virtual machine
Application at Emory
Emory uses a CITRIX based virtual desktop for
many/most clinical programs
This common framework greatly simplifies
administration/backup/security
Installation of programs is also simplified for IT as a
VM can be copied and deployed
Can access same files/applications from home (if in
the firewall), at Emory Clinic, at the Hospital, etc
Has many advantages in certain scenarios, although
can prevent challenges in research environments
Severely limits ability for clinicians/staff to
install/modify programs
The Cloud
With the rapid advancement of virtualization
technology and fast networks, no need to run virtual
machines locally
Many commodity calculations / services can be
outsourced to online service providers (Amazon S3,
Godaddy, Gmail, Dropbox, etc )
Specific machine/hardware an app runs on is
controlled dynamically and can be migrated
automatically in case of hardware failure at the
hosting provider**
Commodity model of software
+ hardware/pay as you go
Scripts/Programming
Script/Macro A program (set of commands) that
performs relatively simple action automatically
For example can open a list of files/documents,
looking for the word diabetes and hypertension
and generate co-occurrence statistics vs making a
medical student or resident scan through documents
manually
Or
Scripting in the wild / why we need Firewalls

85.114.135.121 - - [16/May/2011:08:44:09 -0400] "GET //scripts/setup.php HTTP/1.1" 404 302 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:09 -0400] "GET //admin/scripts/setup.php HTTP/1.1" 404 308 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:09 -0400] "GET //admin/pma/scripts/setup.php HTTP/1.1" 404 312 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:10 -0400] "GET //admin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 319 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows
85.114.135.121 - - [16/May/2011:08:44:10 -0400] "GET //db/scripts/setup.php HTTP/1.1" 404 305 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:10 -0400] "GET //dbadmin/scripts/setup.php HTTP/1.1" 404 310 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:10 -0400] "GET //myadmin/scripts/setup.php HTTP/1.1" 404 310 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:10 -0400] "GET //mysql/scripts/setup.php HTTP/1.1" 404 308 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:11 -0400] "GET //mysqladmin/scripts/setup.php HTTP/1.1" 403 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:11 -0400] "GET //typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 319 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 9
85.114.135.121 - - [16/May/2011:08:44:11 -0400] "GET //phpadmin/scripts/setup.php HTTP/1.1" 404 311 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:11 -0400] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 403 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:12 -0400] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 403 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:12 -0400] "GET //phpmyadmin1/scripts/setup.php HTTP/1.1" 404 314 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:12 -0400] "GET //phpmyadmin2/scripts/setup.php HTTP/1.1" 404 314 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:12 -0400] "GET //pma/scripts/setup.php HTTP/1.1" 404 306 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:12 -0400] "GET //web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98
85.114.135.121 - - [16/May/2011:08:44:13 -0400] "GET //xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 319 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows
85.114.135.121 - - [16/May/2011:08:44:13 -0400] "GET //web/scripts/setup.php HTTP/1.1" 404 306 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:13 -0400] "GET //php-my-admin/scripts/setup.php HTTP/1.1" 404 315 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:13 -0400] "GET //websql/scripts/setup.php HTTP/1.1" 404 309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:13 -0400] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 403 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:14 -0400] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 403 317 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:14 -0400] "GET //phpMyAdmin-2/scripts/setup.php HTTP/1.1" 404 315 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:14 -0400] "GET //php-my-admin/scripts/setup.php HTTP/1.1" 404 315 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"
85.114.135.121 - - [16/May/2011:08:44:14 -0400] "GET //phpMyAdmin-2.2.3/scripts/setup.php HTTP/1.1" 404 319 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 9
Databases and Spreadsheets
Learning curve of spreadsheets is low, however

search and query capability is extremely limited
Introduction to Data Management
Fusheng Wang
Center for Comprehensive Informatics
Emory University
Why Databases?
Data can be stored using multiple methods such as
text files, comma delimited data files, spreadsheets,
databases
Benefits of using a database:
A standard interface for accessing data
Multiple users could simultaneously insert, update and
delete data
Data could be changed without risk of losing data and its
consistency
Efficiently handle huge volumes of data
Tools for data backup, restore and recovery
Security
Reduce redundancy
Data independence
Database Management Systems (DBMS)

A Database Management System (DBMS) is a
software system designed to store, manage, and
facilitate access to databases
A relational DBMS (RDBMS) is a DBMS that is based
on the relational model
Objectives of DBMS
Representing information: data modeling
declarative language for querying data: SQL, XQuery
Efficient support of queries with access methods
Controlling concurrent access
Reliable data storage
The Current RDBMS Market (Forrester09)
Data Model
Define how data is to be represented, structured
linked, and constrained
Independent of specific implementations and protocols
Types of data models
Hierarchical model, network model

Relational model invented by E. F. Codd from IBM
Entity Relational (ER) model
Object-oriented model
Semi-structured (XML) model
Relational model
describes data as a collection of relations
Focuses on providing better data independence
Implemented by most DBMS in the market
ER Model
A popular conceptual model for database design

A database be thought of a collection of instances of
entities, independent of any other entities in the
database
Entities have attributes to characterize the entity
There could be relationships between entities: 1-to-1, 1to-N, or M-to-N,
start_date end_date
1
Patient
ID
name
N
has
age
Observation
concept_code
name
value
unit
Relational Database with Relational Model

A relational database is implemented based on
relational data model
Data stored in tables, consists of columns and rows
Each column has a specific data type
Constrained can be specified: primary key,
uniqueness,
Relationships can be defined with foreign keys
Primary key
SQL: Standard Language for RDBMS

SQL (Structured Query Language) is the standard
language of relational database access
Multiple standard revisions and multiple flavors
(implementations) exist
Procedure SQL (PL/SQL, SQL PL, ) adds
programming capabilities into SQL
A SQL query is compiled and executed by the DBMS
engine and the result is sent to the client
Many approaches to optimize SQL query
performance: indexes, parallel disk readings,
normalization, etc.
SQL (2)
Data Definition Language (DDL), defines properties
of data objects. e.g. creation of a table:
CREATE TABLE OBSERVATION_FACT (
EONCOUNTER_NO INTEGER NOT NULL,
PATIENT_ID
INTEGER,
AGE
INTEGER,
CONCEPT_CODE VARCHAR(50),
NAME
VARCHAR(50),
VALUE
VARCHAR(50),
PRIMARY KEY(ENCOUNTER_NO, CONCEPT_CODE) );
Data manipulation language (DML): retrieve, insert,

update and delete data
SELECT AGE, COUNT(*) FROM OBSERVATION_FACT
WHERE NAME =CHEST PAIN
GROUP BY AGE
ORDER BY AGE
Reliable Storage and High Availability

Problems could happen: system outage, transaction
failure, disk failure, disaster
Backup/restore: DBMS tracks changes as logs, thus
recovery is possible when failure happens
Full snapshot backup
Incremental backup: only changes since last
successful full backup
Restore: rebuild database from backups + logs
High availability: eliminate or minimize downtime

Creating and maintaining replica versions of database
Failover takes place when disaster happens
Access of Databases
Database
Server
JDBC
ODBC
OLEDB
Application/
Web Server
HTTP
HTTPS
Three-Tier
JDBC
ODBC
OLEDB
Two-Tier
Data Exchange: XML

The eXtensive Markup Language (XML) defines a
generic syntax used to mark up data with simple,
human-readable tags
Standard language for data exchange over the Web
HL7 CDA messaging in XML
Data can be published directly in XML from DBMS
<component>
<section>
<templateId root="1.3.6.1.4.1.19376.1.5.3.1.3.4"/>
<code code="10164-2" displayName="History of present illness"
codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC"/>
<title> History of present illness </title>
<text>Carcinoma of breast. Post operative diagnosis: same. left UOQ
breast mass. </text>
</section>
</component>
Specialized DBMSs
Special data models/data types/queries:
XML DBMS: manage XML data directly in XML model
and query language
e.g.: DB2 pureXML, Oracle XML DB
Spatial DBMS: manage location related information.

E.g., find patients located within 10 miles radius
e.g.: Oracle Spatial, ArchGIS, DB2 Spatial
Temporal DBMS: manage temporal oriented

information. e.g.: what drugs have been prescribed
with Proventil?
TeraData, DB2 TemporalDB
Parallel DBMS: use data partitioning and parallel data

access to increase I/O bandwidth for scalability
Oracle RAC, DB2 DPF, DB2 pureScale, TeraData
XML Schema
UML Model
Data Analysis
Applications
(MATLAB, SAS)
Loading Manager
Image
Analysis
Algorithms
Data Mapping
Analytical
Workflow
ZIP
Quality Control
PAIS
Document
Generator
Database Schema
Parallel Database
Data Staging
Example Database: PAIS
Image
Viewer
Application
Server
PAIS Data Repository

Clinical Data
caGrid
Service
Molecular & Genetic

Annotation
Pathology
Image
Database
Image Data
Management
Questions?
Data Communication
& Networking Questions
By www.questionpaperz.in
DATA COMMUNICATION AND NETWORKS

1. Define the term Computer Networks.
A Computer network is a number if computers interconnected by one or more
transmission paths. The transmission path often is the telephone line, due to its
convenience and universal preserve.
2. Define Data Communication.
Data Communication is the exchange of data (in the form of Os and 1s) between two
devices via some form of transmission medium (such as a wire cable).
3. What is the fundamental purpose behind data Communication?
The purpose of data communication is to exchange information between two agents.
4. List out the types of data Communication.
Data Communication is considered
Local if the communicating device are in the same building.
Remote if the device are farther apart.
5. Define the terms data and information.
Data: is a representation of facts, concepts and instructions presented in a formalized
manner suitable for communication, interpretation or processing by human beings or by
automatic means.
Information: is currently assigned to data by means by the conventions applied to those
data.
6. What are the fundamental characteristics on which the effectiveness of data
communication depends on?
The effectiveness of a data communication system depends on three characteristics.
1. Delivery: The system must deliver data to the correct destination.
2. Accuracy: The system must deliver data accurately.
3. Timeliness: The system must deliver data in a timely manner.
7. Give components of data communication.
1.
2.
3.
4.
Message the message is the information to be communicated.

Sender the sender is the device that sends the data message.
Receiver the receiver is the device that receive the message.
Medium the transmission medium is the physical path by which a message
travels from sender to receiver.
5. Protocol A protocol is a set of rules that govern data communication.

8. Define Network.
A Network is a set of devices (nodes) connected by media links. A node can be a
computer, printer, or any other device capable of sending and / or receiving data
generated by other nodes on the network.
9. What are the advantage of distributed processing?
1.
2.
3.
4.
5.
Security / Encapsulation
Distributed database
Faster problem solving
Security through redundancy
Collaborative processing
10. What are the three criteria necessary for an effective and efficient network?
1. Performance
2. Reliability
3. Security
11. Name the factors that affect the performance of a network
-performance of a network depends on a number of factors,
1.
2.
3.
4.
Number of users
Type of transmission medium
Capabilities of the connected hardware
Efficiency of software.
12. Name of the factors that affect the performance of a network.

1. Frequency of failure
2. Recovery time of a network after a failure.
3. Catastrophe.
13. Name the factors that affect the security of a network.
Network security issues include protecting data from unauthorized access and viruses.
14. Define PROTOCOL
A protocol is a set of rules (conventions) that govern all aspects of data communication.
15. Give the key elements of protocol.
Syntax: refers to the structure or format of the data, meaning the order in which they
are presented.
Semantics: refers to the meaning of each section of bits.
1. Timing: refers to two characteristics.
2. When data should be sent and
3. How fast they can be sent.
16. Define line configuration and give its types.

-
Line configuration refers to the way two or more

communication devices attach to a link.
There are two possible line configurations:
i. Point to point and
ii. Multipoint.
17. Define topology and mention the types of topologies.

Topology defines the physical or logical arrangement of links in a network
Types of topology :
-
Mesh
Star
Tree
Bus
Ring
18. Define Hub.

In a star topology, each device has a dedicated point to point link only to a central
controller usually called a hub.
19. Give an advantage for each type of network topology.
1. Mesh topology:
* Use of dedicated links guarantees that each connection can carry its own data load,
thus eliminating traffic problems.
* Robust and privacy / security.
2. Star topology:
* Less expensive than mesh.
* Needs only one link and one input and output port to connect it any number of
others.
* Robustness.
3. Tree topology:
* same as those of a star.
4. Bus topology:
* Ease of installation.
* Uses less cabling than mesh, star or tree topologies.

5. Ring topology:
* A ring is relatively easy to install and reconfigure.
* Each device is linked only to its immediate neighbors.
Fault isolation is simplified.
20. Define transmission mode and its types.
Transmission mode defines the direction of signal flow between two linked devices.
Transmission modes are of three types.
-
Simplex
Half duplex
Full duplex.
21. What is LAN?

Local Area Network (LAN) is a network that uses technology designed to span a small
geographical area. For e.g. an Ethernet is a LAN technology suitable for use in a single
building.
22. What is WAN?
Wide Area Network (WAN) is a network that uses technology designed to span a large
geographical area. For e.g. a satellite network is a WAN because a satellite can relay
communication across an entire continent. WANs have higher propagation delay than
LANs.
23. What is MAN?
* A Metropolitan Area Network (MAN) is a network that uses technology designed to
extend over an entire city.
* For e.g. a company can use a MAN to connect the LANs in all its offices throughout
a city.
24. Define Peer to peer processes.
The processes on each machine that communicate at a given layer are called peer to
peer processes.
25. What is half duplex mode?
A transmission mode in which each station can both transmit and receive, but not at the
same time.
26. What is full duplex mode?
A transmission mode in which both stations can transmit and receive simultaneously.
27. What is internet?
When two or more networks are connected they become an internetwork or internet.
The most notable internet is called the Internet.
28. What is Internet ?

The Internet is a communication system that has brought a wealth of information to out
fingertips and organized it for our use.
Internet Worldwide network.
29. List the layers of OSI model.
-
Physical
Data Link
Network
Transport
Session
Presentation
Application.
30. Define OSI model.

The open system Interconnection model is a layered framework for the design of
network system that allows for communication across all types of computer systems.
31. Which OSI layers are the network support layers?
-
Physical
Data link
Network layers.
32. Which OSI layers are the user support layers?

-
Session
Presentation
Application.
33. What are the responsibilities of physical layer, data link layer, network layer,
transport layer, session layer, presentation layer, application layer.
a. Physical layer Responsible for transmitting individual bits from one node to the
next.
b. Data link layer Responsible for transmitting frames from one node to the next.
c. Network layer Responsible for the delivery of packets from the original source
to the final destination.
d. Transport layer Responsible for delivery of a message from one process to
another.
e. Session layer To establish, manage and terminate sessions.
f. Presentation layer Responsible to translate, encrypt and compress data.

g. Application layer Responsible for providing services to the user. To allow
access to network resources.
34. What is the purpose of dialog controller?
The session layer is the network dialog controller. It establishes, maintains and
synchronizes the interaction between communicating systems.
35. Name some services provided by the application layer.
Specific services provided by the application layer include the following.
-
Network virtual terminal.

File transfer, access and management (FTAM).
Mail services.
Directory services.
36. Define Network Virtual Terminal.

Network Virtual Terminal OSI remote login protocol. It is an imaginary terminal with a
set of standard characteristics that every host understands.
37. Define the term transmission medium.
The transmission medium is the physical path between transmitter and receiver in a
data transmission system. The characteristics and quality of data transmission are
determined both the nature of signal and nature of the medium.
38. What are the types of transmission media?
Transmission media are divided into two categories. They are as follows:
I. Guided transmission media
II. Unguided transmission media
39. How do guided media differ from unguided media?
A guided media is contained within physical boundaries, while an unguided medium is
boundless.
40. What are the three major classes of guided media?
Categories of guided media.
a. Twisted pair cable.
b. Coaxial cable.
c. Fiber optic cable.
41. What is a coaxial cable?

A type of cable used for computer network as well as cable television. The name
arises from the structure in which a metal shield surrounds a center wire. The shield
protects the signal on the inner wire from electrical interference.
42. A light beam travels to a less dense medium. What happens to the beam in each
of the following cases:
1. The incident angle is less than the critical angle.
2. The incident angle is equal to the critical angle.
3. The incident angle is greater than the critical angle.
1. The incident angle is less than the critical angle.
the ray refracts and moves closer to the surface.
2. The incident angle is equal to the critical angle.
the light bends along the interface.
3. The incident angle is greater than the critical angle.
the ray reflects and travels again in the denser substance.
43. What is reflection?
When the angle of incident becomes greater than the critical angel, a new
phenomenon occurs called reflection.
44.
Discuss the modes for propagation light along optical channels.

There are two modes for propagating light along optical channels.
Single mode and multimode.
Multimode can be implemented in two forms: step index or graded index.
45. What is the purpose of cladding in an optical fiber? Discuss its density relative to
the core.
A glass or plastic is surrounded by a cladding of less dense glass or plastic.
The difference in density of the two materials must be such that a beam of light
moving through the core is reflected off the cladding instead of being refracted into it.
46. Name the advantage of optical fiber over twisted pair and coaxial cable.
Higher bandwidth.
Less signal attenuation.
Immunity to electromagnetic interference.
Resistance to corrosive materials.
More immune to tapping.
Light weight.
47. What is the disadvantage of optical fiber as a transmission medium?

Installation / Maintenance.
Unidirectional.
Cost More expensive than those of other guided media.
48. What does the term modem stands for ?
Modem stands for modulator / demodulator.
49. What is the function of a modulator?
A modulator converts a digital signal into an analog signal using ASK, FSK, PSK or
QAM.
50. What is the function of a demodulator?
A de modulator converts an analog signal into a digital signal.
51. What is an Intelligent modems?
Intelligent modems contain software to support a number of additional functions such
as automatic answering and dialing.
52. What are the factor that affect the data rate of a link?
The data rate of a link depends on the type of encoding used and the bandwidth of the
medium.
53. Define Line coding.
Line coding is the process of converting binary data, a sequence of bits, to a digital
signal.
54. For n devices in a network, what is the number of cable links necessary for mesh,
ring, bus and star networks.
Number of links for mesh topology : n (n 1) / 2.
Number of links for ring topology : n 1.
Number of links for bus topology : one backbone and n drop lines.
Number of links for star topology : n.
55. Write the design issues of datalink layer?
1)
2)
3)
4)
Services provided to network layer.

Framing
Error control
Flow control
56. What is datalink?

When a datalink control protocol is used the transmission medium between systems is
referred to as a datalink.
57. What is the main function of datalink layer?
The datalink layer transforms the physical layer, a raw transmission facility to a reliable
link and is responsible for node to node delivery.
58. What is a datalink protocol?
Datalink protocol is a layer of control present in each communicating device that
provides functions such as flow control, error detection and error control.
59. What is meant by flow control?
Flow control is a set of procedures used to restrict the amount of data that the sender
can send before waiting for an acknowledgement.
60. How is error controlled in datalink controlled protocol?
In a datalink control protocol, error control is activated by retransmission of damaged
frame that have not been acknowledged by other side which requests a retransmission.
61. Discuss the concept of redundancy in error detection.
Error detection uses the concept of redundancy, which means adding extra bits for
detecting errors at the destination.
62. What are the three types of redundancy checks used in data communications?
-
Vertical Redundancy Check (VRC)

Longitudinal Redundancy Check (LRC)
Cyclic Redundancy Check (CRC)
63. How can the parity bit detect a damaged data unit?
In parity check, (a redundant bit) a parity bit is added to every data unit so that the total
number of 1s is even for even parity checking function (or odd for odd parity).
64. How can we use the Hamming code to correct a burst error?
By rearranging the order of bit transmission of the data units, the Hamming code can
correct burst errors.
65. Briefly discuss Stop and Wait method of flow control?
In Stop and Wait of flow control, the sender sends one frame and waits for an
acknowledgement before sending the next frame.
66. In the Hamming code for a data unit of m bits how do you compute the number of
redundant bits r needed?
In the Hamming code, for a data unit of m bits, use the formula 2r > = m + r + 1 to
determine r, the number of redundant bits needed.
67. What are three popular ARQ mechanisms?
-
Stop and wait ARQ,

Go Back N ARQ and
Selective Report ARQ.
68. How does ARQ correct an error?

Anytime an error is detected in an exchange, a negative acknowledgment (NAK) is
returned and the specified frames are retransmitted.
69. What is the purpose of the timer at the sender site in systems using ARQ?
The sender starts a timer when it sends a frame. If an acknowledgment is not received
within an allotted time period, the sender assumes that the frame was lost or damaged
and resends it.
70. What is damaged frame?
A damaged frame is recognizable frame that does arrive, but some of the bits are in
error (have been altered during transmission)
71. What is HDLC?
HDLC is a bit oriented datalink protocol designed to support both half-duplex and full
duplex communication over point to point and multiport link.
72. Give data transfer modes of HDLC?
1. NRM Normal Response Mode
2. ARM Asynchronous Response Mode
3. ABM - Asynchronous Balanced Mode
73. How many types of frames HDLC uses?
1. U-Frames
2. I-Frames
3. S-Frame
74. State phases involved in the operation of HDLC?
1. Initialization
2. Data transfer
3. Disconnect
75. What is the meaning of ACK frame?

ACK frame is an indication that a station has received something from another.
76. What is CSMA?
Carrier Sense Multiple Access is a protocol used to sense whether a medium is busy
before attempting to transmit.
77. Explain CSMA/CD
Carrier Sense Multiple Access with collision detection is a protocol used to sense
whether a medium is busy before transmission but is has the ability to detect whether a
transmission has collided with another.
78. State advantage of Ethernet?
1. Inexpensive
2. Easy to install
3. Supports various wiring technologies
79. What is fast Ethernet?
It is the high speed version of Ethernet that supports data transfer rates of 100 Mbps.
80. What is bit stuffing and why it is needed in HDLC?
Bit stuffing is the process of adding one extra 0 whenever there are five consecutive 1s
in the data so that the receiver does not mistake the data for a flag. Bit stuffing is needed
to handle data transparency.
81. What is a bridge?
Bridge is a hardware networking device used to connect two LANs. A bridge operates at
data link layer of the OSI reference model.
82. What is a repeater?
Repeater is a hardware device used to strengthen signals being transmitted on a
networks.
83. Define router?
A network layer device that connects networks with different physical media and
translates between network architectures.
84. State the functions of bridge?
1. Frame filtering and forwarding
2. Learning the address
3. Routing
85. List any two functions which a bridge cannot perform?

- Bridge cannot determine most efficient path.
- Traffic management function.
86. What is hub?
Networks require a central location to bring media segment together. These central
locations are called hubs.
87. State important types of hubs.
1. Passive hub
2. Active hub
3. Intelligent hub
88. Mention the function of hub.
1.
2.
3.
4.
Facilitate adding/deleting or moving work stations

Extend the length of network
It provides centralize management services
Provides multiple interfaces.
89. What is the main function of gateway.

A gateway is a protocol converter
90. A gateway operates at which layer.
Gateway operates at all seven layers of OSI model.
91. Which factors a gateway handles?
Data rate, data size, data format
92. What is meant by active hub?
A central hub in a network that retransmits the data it receives.
93. What is the function of ACK timer?
ACK timer is used in flow control protocols to determine when to send a separate
acknowledgment in the absence of outgoing frame.
94. What are the types of bridges?
1. Transparent bridge
2. Source Routing bridge
Transparent bridge - Transparent bridge keep a suitable of addresses in memory to
determine where to send data
Source Routing bridge - Source Routing bridge requires the entire routing table to be
included in the transmission and do not route packet intelligently.
95.
What are transreceivers?

Transreceivers are combination of transmitter and receiver. Transreceivers are also
called as medium attachment unit (MAU)
96.
What is the function of NIC?

NIC is used to allow the computer to communicate on the network. It supports
transmitting, receiving and controlling traffic with other computers on network.
97.
Mention different random access techniques?

1. ALOHA
2. CSMA
3. CSMA/CD
98.
What is the function of router?

Routers relay packets among multiple interconnected networks. They route packets from
one network to any number of potential destination networks on an internet.
99.
How does a router differ from a bridge?

Routers provide links between two separate but same type LANs and are most active at
the network layer. Whereas bridges utilize addressing protocols and can affect the flow
control of a single LAN; most active at the data link layer.
100. Identify the class and default subnet mask of the IP address 217.65.10.7.
It belongs to class C.
Default subnet mask 255.255.255.192
101. What are the fields present in IP address?
Netid and Hostid.
Netid portion of the ip address that identifies the network.
Hostid portion of the ip address that identifies the host or router on the networks.
102. What is flow control?
How to keep a fast sender from swamping a slow receiver with data is called flow
control.
103. What are the functions of transport layers?

The transport layer is responsible for reliable data delivery. Functions of transport layer
i. Transport layer breaks messages into packets
ii. It performs error recovery if the lower layers are not adequately error free.
iii.Function of flow control if not done adequately at the network layer.
iv.Function of multiplexing and demultiplexing sessions together.
v. This layer can be responsible for setting up and releasing connections across the
network.
104. What is segmentation?
When the size of the data unit received from the upper layer is too long for the network
layer datagrams or datalink frame to handle, the transport protocol divides it in to
smaller, usuable blocks. The dividing process is called segmentation.
105. What is Transport Control Protocol (TCP)?
The TCP/IP protocol that provides application programs with access to a connection
oriented communication service. TCP offers reliable flow controlled delivery. More
important TCP accommodates changing conditions in the Internet by adapting its
retransmission scheme.
106. Define the term (i) Host (ii) IP
a. Host : An end users computer connection to a network. In an internet each computer
is classified as a host or a router.
b. IP: Internet Protocol that defines both the format of packet used on a TCP/IP internet
and the mechanism for routing a packet to its destination.
107. What is UDP?
User Datagram Protocol is the TCP/IP protocol that provides application program with
connectionless communication service.
108. What is the segment?
The unit of data transfer between two devices using TCP is a segment.
109. What is a port?
Applications running on different hosts communicate with TCP with the help of a concept
called as ports. A port is a 16 bit unique number allocated to a particular application.
110. What is Socket?
The communication structure needed for socket programming is called socket.
A port identifies a single application on a single computer.

Socket = IP address + Port number
111. How TCP differ from the sliding window protocols.
TCP differs from the sliding window protocols in the following ways:
1. When using TCP, applications treat the data sent and received as an arbitrary byte
stream. The sending
TCP module divides the byte stream into a set of packets called segments,
and sends individual segments within an IP datagram.
-
TCP decides where segment boundaries start and end.
2. The TCP sliding window operates at the byte level rather than the packet (or
segment) level. The left and right window edges are byte pointers.
3. Segment boundaries may change at any time. TCP is free to retransmit two adjacent
segments each containing 200 bytes of data as a single segment of 400 byte.
4. The size of the send and receive window change dynamically.
112. Explain how the TCP provides the reliability?
A number of mechanisms provide the reliability.
1. Checksum
2. Duplicate data detection
3. Retransmission
4. Sequencing
5. Timers
113. What is a datagram socket?
A structure designed to be used with connectionless protocols such as UDP.
114. What is congestion?
When load on network is greater than its capacity, there is congestion of data packets.
Congestion occurs because routers and switches have queues or buffers.
115. Define the term Jitter.
Jitter is the variation in delay for packets belonging to the same flow.
116. What is Configuration management?

Configuration management (CM) is a field of management that focuses on establishing
and maintaining consistency of a system or product's performance and its functional and
physical attributes with its requirements, design, and operational information throughout
its life.
117. What is Fault management?
Fault management is the set of functions that detect, isolate, and correct malfunctions in
a telecommunications network, compensate for environmental changes, and include
maintaining and examining error logs, accepting and acting on error detection
notifications, tracing and identifying faults, carrying out sequences of diagnostics tests,
correcting faults, reporting error conditions, and localizing and tracing faults by
examining and manipulating database information.
118. What is Performance management?
Performance management includes activities that ensure that goals are consistently
being met in an effective and efficient manner. Performance management can focus on
the performance of an organization, a department, employee, or even the processes to
build a product or service, as well as many other areas.
119. What is Security management?
Security Management is a broad field of management related to asset management,
physical security and human resource safety functions. It entails the identification of an
organization's information assets and the development, documentation and
implementation of policies, standards, procedures and guidelines.
120. What is Accounting management?
Accounting Management is the practical application of management techniques to
control and report on the financial health of the organization. This involves the analysis,
planning, implementation, and control of programs designed to provide financial data
reporting for managerial decision making. This includes the maintenance of bank
accounts, developing financial statements, cash flow and financial performance analysis.
IBPS SO IT Officer Study Material Downloaded From www.questionpaperz.in

1) The code for a Web page is written using Hypertext Markup language
2) The first computers were programmed using Machine language
3) A filename is a unique name that you give to a file of information
4) This can be another word for program software
5) Any data or instruction entered into the memory of a computer is considered as storage
6) Which part of the computer displays the work done monitor
7) A series of instructions that tells a computer what to do and how to do it is called a program
8) Hardware is the part of a computer that one can touch and feel .
9) The role of a computer sales representative generally is to determine a buyers needs and match it to
the correct hardware and software.
10) Supercomputers refers to the fastest, biggest and most expensive computers
11) Executing is the process of carrying out commands
12) The rectangular area of the screen that displays a program, data, and or information is a window

13) The process of a computer receiving information from a server on the internet is known as
downloading
14) Disk drive is the part of the computer helps to store information
15) Arithmetic operations include addition, subtraction, multiplication, and division
16) A keyboard is the king of input device
17) An error is known as bug
18) A collection of related information sorted and dealt with as a unit is a file
19) Sending an e-mail is similar to writing a letter
20) IT stands for information technology
21) A menu contains commands that can be selected
22) Plotter, printer, monitor are the group consists of output devices
23) Edit menu is selected to cut, copy and paste
24) The most important or powerful computer in a typical network is network server
25) The primary purpose of software is to turn data into information
26) The ability to find an individual item in a file immediately direct access is used.
27) To make a notebook act as a desktop model, the notebook can be connected to a docking station
which is connected to a monitor and other devices
28) You can use the tab key to move a cursor across the screen, indent a paragraph.
29) A collection of related files is called record.
30) Storage that retains its data after the power is turned off is referred to as non-volatile storage.
31) Internet is an example of connectivity.
32) Testing is the process of finding errors in software code.
33) A syntax contains specific rules and words that express the logical steps of an algorithm.
34) Changing an existing document is called the editing documentation
35) Virtual memory is memory on the hard disk that the CPU uses as an extended RAM.
36) Computers use the binary number system to store data and perform calculations.
37) The windows key will launch the start buttons.

38) To move to the beginning of a line of text, press the home key.
39) When sending an e-mail, the subject lines describe the contents of the message.
40) Tables, paragraphs and indexes work with when formatting text in word.
41) TB is the largest unit of storage.
42) The operating system tells the computer how to use its components.
43) When cutting and pasting, the item cut is temporarily stored in clipboard.
44) The blinking symbol on the computer screen is called the cursor.
45) Magnetic tape is not practical for applications where data must be quickly recalled because tape is a
sequential because tape is a sequential access medium.
46) Rows and columns are used to organize data in a spread sheet.
47) When you are working on documentation on PC, the document temporarily stored in flash memory.
48) One megabyte equals approximately 1 million bytes.
49) Information travels between components on the motherboard through buses.
50) RAM refers to the memory in your computer.
51) Computer connected to a LAN can share information and or share peripheral equipment
52) Microsoft office is an application suite
53) Utilities can handle most system functions that arent handled directly by the operating system
54) If you receive an e-mail from someone you dont know then you should delete it without opening it
55) A set of instructions telling the computer what to do is called program
56) LAN refers to a small single site network
57) A collection of programs that controls how your computer system runs and processes information is
called operating system.
58) Device drivers are small, special-purpose programs
59) Transformation of input into output is performed by the CPU
60) Data going into the computer is called input.
61) Binary choice offer only two options
62) To indent the first paragraph of your report, you should use tab key

63) Fields are distinct item that dont have much meaning to you in a given context
64) A website address is a unique name that identifies a specific web site on the web
65) Modem is an example of a telecommunications device
66) A set of computer programs used for a certain function such as word processing is the best
definition of a software package
67) You can start Microsoft word by using start button
68) A blinking symbol on the screen that shows where the next character will appear is a cursor
69) Highlight and delete is used to remove a paragraph from a report you had written
70) Data and time are available on the desktop at taskbar
71) A directory within a directory is called sub directory
72) Testing is the process of finding errors in software code
73) In Excel, charts are created using chart wizard option
74) Microcomputer hardware consists of three basic categories of physical equipment system unit,
input/output, memory

75) Windows is not a common feature of software applications
76) A tool bar contains buttons and menus that provide quick access to commonly used commands
77) For creating a document, you use new command at file menu
78) Input device is equipment used to capture information and commands
79) A programming language contains specific rules and words that express the logical steps of an
algorithm
80) One advantage of dial-up internet access is it utilizes existing telephone security
81) Protecting data by copying it from the original source is backup
82) Network components are connected to the same cable in the star topology
83) Two or more computers connected to each other for sharing information form a network
84) A compute checks the database of user name and passwords for a match before granting access
85) Computers that are portable and convenient for users who travel are known as laptops
86) Spam is the term for unsolicited e-mail

87) Utility software type of program controls the various computer parts and allows the user to interact
with the computer
88) Each cell in a Microsoft office excel document is referred to by its cell address, which is the cells row
and column labels
89) Eight digit binary number is called a byte
90) Office LANs that are spread geographically apart on a large scale can be connected using a corporate
WAN
91) Storage is the process of copying software programs from secondary storage media to the hard disk
92) The code for a web page is written using Hyper Text Markup Language
93) Small application programs that run on a Web page and may ensure a form is completed properly or
provide animation are known as flash
94) In a relational database, table is a data structure that organizes the information about a single topic
into rows and columns
95) The first computers were programmed using assembly language
96) When the pointer is positioned on a hyperlink it is shaped like a hand
97) Booting process checks to ensure the components of the computer are operating and connected
properly

98) Checking the existing files saved on the disk the user determine what programs are available on a
computer
99) Special effect used to introduce slides in a presentation are called animation
100) Computers send and receive data in the form of digital signals
101) Most World Wide Web pages contain HTML commands in the language
102) Icons are graphical objects used to represent commonly used application
103) UNIX is not owned and licensed by a company
104) In any window, the maximize button, the minimize button and the close buttons appear on the title
bar
105) Dial-up Service is the slowest internet connection service
106) Every component of your computer is either hardware or software
107) Checking that a pin code number is valid before it is entered into the system is an example of data
validation
108) A compiler translates higher level programs into a machine language program, which is called
object code
109) The ability to find an individual item in a file immediately direct access
110) Computers connected to a LAN can share information and/or share peripheral equipment
111) A CD-RW disk can be erased and rewritten
112) The two major categories of software include system and application
113) Windows 95, Windows 98 and Windows NT are known as operating systems
114) Information on a computer is stored as analog data
115) A spread sheet that works like a calculator for keeping track of money and making budgets
116) To take information from one source and bring it to your computer is referred to as download
117) Each box in a spread sheet is called a cell
118) Network components are connected to the same cable in the bus topology
119) Two or more computers connected to each other for sharing information form a network
120) A computer checks the database of user names and passwords for a match before granting access.
121) Spam is the other name for unsolicited e-mail

122) Operating system controls the various computer parts and allows the user to interact with the
computer
123) Each cell in a Microsoft Office Excel document is referred to by its cell address, which is the cells
row and column labels
124) Download is the process of copying software programs from secondary storage media to the hard
disk
125) The code for a web page is written using Hypertext Markup Language
126) Small application programs that run on a web page and may ensure a form is completed properly
or provide animation are known as Flash
127) A file is a unique name that you give to a file of information
128) For seeing the output, you use monitor
129) CDs are of round in shape
130) Control key is used in combination with another key to perform a specific task
131) Scanner will translate images of text, drawings and photos into digital form
132) CPU is the brain of the computer
133) Something which has easily understood instructions is said to be user friendly
134) Information on a computer is stored as digital data
135) For creating a document, you use new command at file menu
136) The programs and data kept in main memory while the processor is using them
137) Ctrl + A command is used to select the whole document
138) Sending an e-mail is same as writing a letter
139) A Website address is a unique name that identifies a specific website on the web
140) Answer sheets in bank POs/Clerks examinations are checked by using Optical Mark Reader
141) Electronic data exchange provides strategic and operational business opportunity
142) Digital signals used in ISDN have whole number values
143) Assembler is language translation software
144) Manual data can be put into computer by scanner
145) In a bank, after computerization cheques are taken care of by MICR

146) The banks use MICR device to minimize conversion process
147) Image can be sent over telephone lines by using scanner
148) Microchip elements are unique to a smart card
149) MS-DOS is a single user operating system
150) Basic can be used for scientific and commercial purpose
151) All computers can execute machine language programs
152) Programs stored in ROM cant be erased
153) Ethernet is used for high speed telecommunications
154) IP address can change even if the domain name remains same
155) Each directory entry can be of 32 bytes
156) With the help of Control + Del a letter can be erased in a word
157) Disk can keep maximum data
158) FORTRAN is a scientific computer language
159) Computer language COBOL is useful for commercial work
160) COBOL is a high standard language like English
161) In computer the length of a word can be measured by byte
162) Byte is the unit of storage medium
163) ROM is not a computer language
164) Oracle is a database software
165) Sequential circuit is full aid
166) Processor is must for computer
167) ROM keeps permanent memory
168) Screen display is called by windows in lotus
169) Pascal is a computer language
170) Expanded form of IBM is International business machine

171) IC chips are made of silicon
172) Indias Silicon Valley situated at Bangalore
173) RAM and ROM are the storage devices of computer
174) DOS is to create relation between hardware and software
175) LOTUS 1-2-3 is software
176) Voice mail is a personal security code for GSM subscribers
177) Tables holds actual data in the database
178) Trojan is a virus
179) Static keys make WEB insecure
180) Video signal needs highest bandwidth
181) Connectivity means communication between systems
182) Controlling is not required for high level language programs before it is executed
183) 3 out of three rollers are responsible for the movement of cursor on screen
184) Hardware that adds two numbers is arithmetic logical unit
185) Data accuracy is not done by modem
186) LAN is used for networks setup within a building
187) A data communication system requires terminal device, communication channel, protocols
188) Most common channel used by networks today is satellite
189) Run Time is not a type of error
190) A five-digit card attribute used for postal ZIP codes it will be restored as a numeric data
191) Computer viruses can be attached to an executable program
192) MS-DOS was the first operating system
193) The smallest space where information on a hard disk is kept in a cluster
194) An information is processed data
195) Intelligence is not a characteristic of a computer

196) Private key is used to spend a digital signature
197) Negative numbers can be represented in binary
198) VDU is not an essential part of a computer
199) The printers are line printer, laser, dot matrix
200) Speed of clock of CPU is measured in megahertz
201) Cache is not a secondary storage device
202) Disk can be used to store sequential files and random files
203) Windows is not an application
204) When taking the output information is produced in hardcopy and/or softcopy form
205) Control units function is to decode program instructions
206) The most powerful type of computer amongst the following is supercomputer
207) GO TO statement is used in C, C++, basic language
208) File menu is selected to print
209) The name a user assigns to a document is called a filename
210) A processor is an electronic device that processes data, converting it into information
211) Control words are words that a programming language has set aside for its own use
212) Monitor and printer are the two types of output devices
213) To access properties of an object, the mouse technique to use is right-clicking
214) An operating system is a program that makes the computer easier to use
215) Connections to the internet using a phone line and a modem are called dial-up connections
216) To access a mainframe or supercomputer, users often use a terminal
217) A flaw in a program that causes it to produce incorrect or inappropriate results is called a bug
218) A web site address is a unique name that identifies a specific web site on the web
219) Every component of your computer is either hardware or software
220) To make the number pad act as directional arrows, you press the num lock key

221) When creating a word-processed document, formatting text involves the user changing how words
on the page appear, both on the screen and in printed form
222) The ALU performs simple mathematics for the CPU
223) A computer program is a set of keywords, symbols, and a system of rules for constructing
statements by which humans can communicate the instructions to be executed by a computer
224) The another word for program is software
225) The name of the computers brain is CPU
226) A computer is a device that electronically processes data, converting it to information
227) Laptops are computers that can be carried around easily
228) The secret code that restricts entry to some programs is password
229) The basic goal of computer process is to convert data into information
230) The disk is placed in the CPU in a computer
231) A hard copy of a document is printed on the printer
232) The name that the user gives to a document is referred to as file name

233) Restarting a computer that is already on is referred to as warm booting
234) E-mail is the transmission of messages and files via a computer network
235) The person who writes and tests computer programs is called a programmer
236) The information you put into the computer is called data
237) The output devices make it possible to view or print data
238) A chat is a typed conversation that takes place on a computer
239) Hardware includes the computer and all the devices connected to it that are used to input and
output data
240) The most common method of entering text and numerical data into a computer system is through
the use of a keyboard
241) Mouse, keyboard, plotter are the group consist of only input devices
242) 256 values can be represented by a single byte
243) Transformation of input into output is performed by the CPU
244) Vertical-market applications can handle most system functions that arent handled directly by the
operating system

246) LAN refers to a small, single-site network
247) A set of instructions telling the computer what to do is called program
248) Data going into the computer is called input
249) If you receive an e-mail from someone you dont know, then delete it without opening it
250) Two options does a binary choice offer
251) A collection of programs that controls how you computer system runs and processes information is
called operating system
252) Rows and columns are data organized in a spreadsheet
253) When cutting and pasting, the item cut is temporarily stored in clipboard
254) When you are working on a document on a PC, the document is temporarily stored in flash
memory
255) One megabyte equals approximately 1 million bytes
266) A cluster represents a group of sectors

267) Digital signals used in ISDN have discrete values
268) Assembler is language translation software
269) Manual data can be put into computer by scanner
270) Bandwidth means channel capacity amount of data following via cables and measure of speed
271) Chip can keep maximum data
272) Debugging is the process of finding errors in software code
273) Time Bomb are viruses that are triggered by the passage of time or on a certain data
274) Linux is an open source operating system
275) Boot sector viruses are often transmitted by a floppy disk left in the floppy drive
276) Operating system controls the way in which the computer system functions and provides a medium
by which users can interact with the computer
277) Servers are computers that provide resources to other computers connected to a network
278) Field names describe what a data field is
279) You must install router on a network if you want to share a broadband internet connection
280) A goal of normalization is to minimize the number of redundancy
281) Programs from the same developer, sold, bundled, together, that are provide better integration
and share common features, tool box and menus are known as software suits
282) A data ware house is one that organizes important subject areas
283) URL term identifies a specific computer on the web and the main page of the entire site
284) A proxy server is used to process client request for web pages
285) When data changes in multiple list and list are not updated, this causes data inconsistency
286) Granting an outside organization access to internet web pages is often implemented using a
extranet
287) The code that relational database management systems use to perform their database task is refer
to as SQL
288) URL stands for Uniform resource locator
289) A data base management system is a software system used to create, maintain and provide
controlled access to a database
290) The two broad categories of software are system and application

291) The metal or plastic case that holds all the physical parts of the computer is known as system unit
292) Data integrity means that the data contained in the database is accurate and reliable
293) A local area network is a private corporate network used exclusively by company categories
294) Eight bits equal to one byte
295) A byte can hold one character data
296) A characteristic of a file server is manages files operations and is shared on a network
298) The development process of computer started in 1617
299) The role of Blaze Pascal in the development of computer is addition and subtraction of numbers
300) The inventor of computer is Charles Babbage
301) The IBM made the first electronic computer in 1953
302) The silicon based transistor started to made by Gordon Tin in 1954
303) IBM is a company
304) The personal computer invented in 1981
305) 1 Kilobyte is equal to 1024 bytes
306) LCD means liquid crystal display
307) UPS converts DC voltage into AC voltage
308) The word ZIP means Zicxac Inline Pin
309) With the help of Pal switch we can increase or decrease the speed of computers processing
310) The full form of MICR is magnetic ink character reader
311) Swift networks are used between banks
312) Panda is a virus
313) Boot sector is hardware
314) Debug searches the fault of any software
315) OCR means optical character recognition
316) The total printout of a program is called listing

317) With the help of ZIP we can decrease the size of any programs
318) We can format the hard disk with the help of Booting
319) CANNED is called as readymade software
320) The process of creating third file by mixing two files is called as mail merging
321) The figures and lines etc. made by computer is called as graphics
322) Each line represents 65 letters in a WordStar
323) Nokia- 7500 is not the example of Micro Processor
324) The first name of the digital computer is Unvake
325) The modern computer was invented in 1946
326) The full form of DOS is disk operating system
327) The expanded form of FORTRAN is formula translation
328) The great revolution came in computer sector in 1960
329) Magnetic tape is called as Input device of computer
330) The first mechanical computer of Charles Babbage is known as punch card machine
331) The IC chip used in computer is generally made in silicon
332) Telephone broadcast is the example of simplex transmission
333) Optical, Mechanical are the kinds of mouse
334) Control panel is used for increasing and decreasing the speed of the cursor of mouse
335) The capacity of modern main frame digital computer is 10(to the power of -12) mbps
336) With the help of my computer we can know about the usage and availability of space in computer
337) We use both MS-Word and page maker for making resume
338) Earliest computers that would work with FORTRAN was second generation
339) Back Ups in database are maintained for to restore the lost data
340) IDEA is a encryption technique
341) DBRM takes care of storage of data in a database

342) The job of DBMS is to decrease redundancy
343) Digital signatures use encryption for authenticating
344) OS acts as intermediary agency between user and hardware
345) Plotters gives the highest quality output
346) ROM is built in memory in computer
347) FLASH is a RAM
348) PRAM is not a RAM
349) FLASH device is used in cell phones
350) Internal storage is same as the primary storage
351) IMAC is name of a machine
352) First generation computers could do batch processing
353) The analytic engine was created by Charles Babbage
354) Voicemail of GSM service provider has the personal security code for its subscribers
355) Senior manager decided about the division of work with respect to IT security
356) Encrypting file system of window XP professional operating system protects the data of a user,
even if the computer is shared between users
357) The .mpeg is the format of a movie file
358) Controlling is NOT required for high level language program before it is executed
359) A plotter is output device
360) 80286 is a hardware part of microprocessor
361) Top-bottom approach can not be the measure of network traffic
362) A switching mode power supply is used for converting raw input power to stabilize DC power
363) Spooler can manage the whole printing process
364) Validity routines control procedures can be used to ensure completeness of data
365) Less expensive than leased line networks is not a characteristic of virtual private networks (PVN)
366) Program policy framework provides strategic direction to an organization

367) Cross bar switches have common control
368) Row-level security is the most basic part for database security
369) Voice recognition software can not be used for converting text into voice
370) The user account can only be created by the network administrator
371) IBM-700 belongs to second generation
372) Allocating adequate bandwidth would help her in speeding up the data transfer over net
373) BCD means binary coded decimal
374) Extended system configuration data is same as BIOS
375) Digitizer is an input device
376) Caramel is the latest platform of Intel Centrio microprocessor
377) RISC is known as storage device
378) NORTON is an anti-virus
379) The system file of an operating system is COM
380) ATMs of bank have real currency
381) A program that converts high level language to machine language is assembler
382) .txt files can be made in notepad, MS word, DOS editor
383) .Zip is a compressed file
384) Internet is a WAN
385) MP3 technology compresses a sound sequence to one-twelfth of its original size
386) At a time only one operating system can be at work on a computer
387) If multiple programs can be executed at the same time, it is distributed operating system
388) If the operating system provides quick attention, it is real time operating system
389) Distributed operating system uses network facility
390) FORMAT command in MS-DOS is used for recreating disk information
391) COPY command in MS-DOS is used to copy one or more files in disk drive to another, copy from
one directory to another directory

392) REN command is Internal command
393) Tom Burners-Li propounded the concept of World wide web
394) The memory address sent from the CPU to the main memory over a set of wires is called address
bus
395) MODEM is an electronic device required the computer to connect to the INTERNET
396) A source program is a program which is to be Tran scripted in machine language
397) Virus in computer relates to program
398) Floppy is not a storage medium in the computer related hardware
399) DOS floppy disk does not have a boot record
400) The CPU in a computer comprises of Store, arithmetic and logical unit and control unit
401) In computer parlor a mouse is a screen saver
402) UNIVA is the name of the computer which was first used for programming and playing of music
403) The IC chips for computer is prepared from silicon

404) Database management systems are comprised of tables that made up of rows called records and
columns called fields
405) Nano is equal to 10(to the power of -9)
406) In computers RAM is a non-volatile memory
407) Disk and tape drives are commonly used as hard copy
408) When computer is connected to LAN and data is sent across it for storage/processing is online
processing
409) The primary storage unit is also referred to as internal storage
410) Oracle is not an operating system
411) Data are raw facts and figures
412) Holding of all data and instructions to be processed is one of the functions of storage unit
413) To select the entire row in Excel, click mouse at row heading
414) Database is known as structured data
415) Normal view and outline view notes pane appears in power point

416) The user protection feature of an operating system is required in multi-user system only
417) In word, Ctrl + Del combination of keys press to delete an entire word
418) In MS-Word double clicking a word selects the word
419) Word document can be navigated in web layout view
420) In Excel, addressing mode(s) that can be used in a formula is/are only absolute and relative
421) Note page views can you use to show just the slide and its contents
422) Hardware that adds two numbers is arithmetic logical unit
423) The computer as a machine and all other physical equipment associated with it are termed as
hardware
424) Plotters are very useful in applications such as computer aided design
425) Corel DRAW is a graphic package
426) The print to file option creates .prn file
427) The enhanced keyboard contains 101 keys
428) Data processing cycle consists of input cycle, output cycle and processing cycle
429) Page Setup is not an option of Edit menu
430) Radar chart is used to show a correlation between two data series
431) A computerized business information system includes hardware, software and data facts
432) Purchase order file is a transaction file
433) A typical computerized business application system will have both master and transaction file
434) Problem identification is taken first in designing a program
435) The purpose of the EXIT command is to get out of a condition loop
436) Employees details is a master file for the pay roll system
437) A slow memory can be connected to 8085 by using READY
438) A processor needs software interrupt to obtain system services which need execution of privileged
instructions
439) A CPU has two modes- privileged and non-privileged. In order to change the mode from the
privileged to the non-privileged, a software interrupt is needed
440) Swap space resides at disk
441) The process of assigning load addressed to the various parts of the program and adjusting the code
and data in the program to reflect the assigned address is called relocation
442)1 Sector= 4096 bytes
443) Two stacks of size required to implement a queue of size n
444) 1 Floppy = 6, 30,784 bytes or 308 KB
445) Consider a machine with 64 MB physical memory and a 32-bit virtual address space. If the page size
is 4 KB, then size of the page table is 2 MB
446) Consider a virtual memory system with FIFO page replacement policy. For an arbitrary page access
pattern, increasing the number of page frames in main memory will always decrease the number of
page faults
447) Consider a scheme R(A, B, C, D) and functional dependencies A->B and C-> D. Then the
decomposition of R into R1 (AB) and R2 (CD) is dependency preserving and lossless join
448) Main memory requires a device driver
449) RAM can be divided into 2 types
450) Two friends suitably arrange 4 blocks of different colors to exchange coded information between
them. 4 bits of information is one exchanging each time
451) Cache memory is a part of main memory
452) The number 43 in 2s complement representation is 01010101
453) The 8085 microprocessor responds to the presence of an interrupt by checking the TRAP pin for
high status at the end of each instruction fetch
454) All machinery and apparatus of computer is called hardware
455) The number 1024 bytes is the complement representation of 1 Kilo byte
456) System design specifications do not include blueprint showing the layout of hardware
457) Web pages are uniquely identified using URL
458) The results of arithmetic and logical operations are stored in an accumulator
459) The input device that is closely related to touch screen is the light pen
500) F2 keys of control center specified below displays data, toggles browse/edit
501) A compiler breaks the source code into a uniform stream of tokens by lexical analysis
502) The number of processes that may running at the same time in a large system can be thousands
503) LET.BAS files are related to Microsoft word utility

504) A command in UNIX can have one or more arguments
505) A hard disk is logically organized according to cylinders and sectors
506) A frame can include text & graphics, tables & graphics, graphics
507) All the formatting data for the paragraph is stored in the paragraph mark
508) The abbreviation CAD stands for computer aided design
509) We can define hypertext definition in notebooks using Macsyma 2.0
510) The addressing mode(s) that can be used in a formula is/are- absolute, relative and mixed
511) WINDOWS can work in enhanced and standard modes
512) The part of a machine level instruction which tells the central processor what has to be done is an
operation code
513) O-Matrix software packages do not have animation capabilities
514) In order to paste text form the clipboard in the document being edited, press the Ctrl-A key
515) A program that converts a high level language program to a set of instructions that can run on a
computer is called a compiler

516) Faster execution of programs is not an advantage of a subroutine
517) First generation of computer period is 1945-1950
518) IBM built first PC in the year 1981
519) A small computer program embedded within an HTML document when a user retrieves the web
page from a web server is called an applet
520) Another name for systems implementation is transformation
521) The central host computer or file server in a star network maintains control with its connecting
devices through polling
522) C++ does not check whether the index value is within scope
523) The browser uses uniform resource locator to connect to the location or address of internet
resources
524) In the centralized computing architecture, the entire file is downloaded from the host computer to
the users computer in response to a request for data
525) An expert system enables one or more users to move and react to what their senses perceive in a
computer simulated environment
526) Popping or removing an element from an empty stack is called underflow

527) The ability to combine data and operations on that data in a single unit is known as polymorphism
528) A router is a device that sites between your internal network and the internet and limits access into
and out of your network based on your organizations access policy
529) In C++, private, protected, and public are reserved words and are called member access specifiers
530) The integration of all kinds of media such as audio, video, voice, graphics and text into one
coherent presentation combined is called multimedia
531) The derived class can redefine the public member functions of the base class
532) A technique for searching special databases, called data warehouses, looking for related
information and patterns is called data mining
533) Like the quick sort the merge sort uses the divide and conquer technique to sort a list
534) The use of expert systems technology can greatly reduce the number of calls routed to a customer
service department
535) Building a list in the backward manner, a new node is always inserted at the beginning of the list
536) Creating a web site is also called web authoring
537) Using the optimization analysis approach, the expert system starts with a conclusion and tries to
verify that the rules, facts, and conclusion all match. If not, the expert system chooses another
conclusion

538) Both the application program and operating system program only describes the software
539) Root is one of the items given below is not an item of the menu bar
540) BACKUP is not an internal command
541) In a disk, each block of data is written into one sector
542) Hard copy is a printed copy of machine output
543) A boot strap is a small utilization computer program to start up in inactive computer
544) CAD is oriented towards software
545) Icons are picture commands
546) IBM company for the first time launched pocket computers
547) PROM is a computer part
548) Algorithms can handle most system functions that arent handled directly by the operating system
550) LAN refers to a small, single site network
551) A collection of programs that controls how your computer system runs and processes information
is called operating system
552) When we are working on a document on a PC the document is temporarily stored in RAM
553) Information travels between components on the motherboard through buses
554) Microsoft is a vertical market application
555) RAM refers to the memory in your computer
556) Computers connected to a LAN can share information and / or share equipment
557) Magnetic tape is not practical for applications where data must be quickly recalled because tape is
a sequential access medium
558) In Late 1988 computer viruses land in India for the first time
559) ALU is a part of the CPU
560) In computer technology a compiler means a program, which translates source program into object
program
561) American computer company IBM is called big blue
562) The first IBM PC did not have any ROM
563) The digital computer was developed primarily in UK
564) Programs which protect a disk from catching an infection are called antidotes
565) The first movie with terrific computer animation and graphics was released in 1982 is Tron
566) An integrated circuit is fabricated on a tiny silicon chip
567) The word size of a microprocessor refers to the amount of information that can be stored in the
byte
568) Daisy-wheel printer cannot print graphics
569) In the IBM PC-AT, the word AT stands for advanced terminology
570) Dedicated computer means which is assigned one and only one task
571) Real time programming type of computers programming is used for aero plane ticket reservation
system
572) RAM means memory which can be both read and written to
573) Laser printer uses light beam and electro statically sensitive black powder
574) The Santa Clara Valley, California is popularly known a Silicon Valley of America because many
silicon chip manufacturing firms are located there
575) A program written in machine language is called assembler
576) International business machine was the first company in the world to build computer for sale
577) PARAM is a parallel computer
578) For communications, wide area networks use special purpose telephone wires and fiber optic
cables and microwaves
579) Data transfer rate in modems is measured in bits per second
580) A compiler cannot detect logical errors in source programs
581) Throughput, turnaround time, response time are measures of system performance
582) OLTP architecture can handle a limited number of dimensions whereas OLAP architecture does not
have any limit on the number of dimensions
583) The binary equivalent of (40.125) suffix 10 is 101000.001
584) Kernel is the only art of an operating system that a user cannot replace or modify
585)
Symbol signifies a magnetic disk

586) COBOL programming language was initially developed for consumer electronics goods
587) Running, blocked, ready, terminated are different states of a process
588) Rational rose is an example of a modeling language
589) A disk worm is an example of optical devices
590) A RAID is a disk array
591) The first private internet service provider in India was Satyam infoway
592) The maximum and minimum unsigned number which can be stored in a 8 bit word is 0 and 255
593) Stack is a part of memory
594) HIT RATIO is associated with cache performance
595) Laser printer is a page printer
596) Storage capacity of a disk system depends upon number of recording surfaces and number of
sectors per track
597) Abstraction is associated with object oriented technology and database technology
598) The terms opcode and operand are associated with any high level language
599) Dynamic binding is associated with object oriented programming
600) The term CHIP, JEWELLARY means a processor with high capacity
601) A watch point is associated with debugger
602) A multithreaded program uses multiple processes
603) Time sharing is a mechanism to provide spontaneous interactive use of a computer system by
many users in such a way that each user is given the impression that he/she has his/her own computer
604) The typical scheme of memory management used in IBMOS/360 mainframe system was that of
multiprogramming with variable number of memory partitions
605) The concepts used for realization of virtual memory are swapping, demand paging and In-line
secondary storage
606) Oracle 8i is an example of OORDBMS
607) ALPHA, RIOS, SPARC are examples of RISC Processors
608) The scope of an identifier refers to where in the program an identifier is accessible
609) Hierarchy is not a component of relational database
610) A two-way selection in C++ is the ifelse
611) A recursive function executes more efficiently than its iterative counterpart
612) The body of the recursive function contains a statement that causes the same function to execute
before completing the last call
613) Variables that are created during program execution are called dynamic variables
614) When destroying a list, we need a delete pointer to deallocate the memory
615) The first character in the ASCII character set is the null character, which is nonprintable
616) A variable for which memory is allocated at block entry and deallocated at block exit is called a
static variable
617) Signal to noise ratio compares signal strength to noise level
618) The ability to create new objects from existing objects is known as inheritance
619) Software tools that provide automated support for the systems development process are OLAP
tools
620) Applications/Web server tier processes HTTP protocol, scripting tasks, performs calculations, and
provides access to data
621) A language used to describe the syntax rules is known as meta language

622) In a preorder traversal, the binary tree is traversed as follows
623) The general syntax of the function prototype of a value returning function is function name
(parameter list)
624) Competitive intelligence is the process of gathering enough of the right information in a timely
manner and usable form and analyzing it so that it can have a positive impact
625) Tracing values through a sequence is called a play out
626) In a binary tree, each comparison is drawn as a circle, called a node
627) The term used as a measurement of a communication channels data capacity is bandwidth
628) In addition to the nature of the problem, the other key factor in determining the best solution
method is function
629) An E-R data model solves the problem of presenting huge information system data models is to
users and developers
630) In C++, predefined functions are organized into separate libraries
631) The standard protocol (communication rules for exchange of data) of the internet is TCP/IP
632) For efficiency purposes, wherever possible, you should overload operators as member functions

633) Modifying algorithms that change the order of the elements, not their values, are also called
modifying algorithms
634) As long as the tables in a relational database share at least one common data attribute, the tables
in a relational database can be normalized to provide useful information and reports
635) DBMS is a simple, fourth generation language used for data retrieval
636) An occurrence of an undesirable situation that can be detected during program execution is known
a exception
637) A photo of the company headquarters would be an example of the use of a static web page
638) The Pentium processor contains thousands of transistors
639) SOI technology has been successful with reference to SRAM
640) Secondary storage device is needed to print output results
641) Static random access memory does not fall under the category of secondary storage devices
642) Floppy disk is universal, portable and inexpensive but has its own limitation in terms of storage
capacity and speed
643) Some physical property of the microscopic area of the disk surface is changed for recording in
common for all types of disks

644) In a disk, each block of data is written into two or more sectors
645) A floppy disk rotates at 100rpm
646) A hard disk has 500 to 1000 tracks or more
647) The storage capacity of a cartridge tape is 400 MB
648) Single density recording is also known as frequency modulation
649) Printer is not an input device
650) The input device that is most closely related to a touch screen is the light pen
651) Laser printer generates characters from a grid of pins
652) The liquid crystal display works on the basis of the relation between polaristion and electric field
653) A COBOL program in source code is not considered to be system software
654) Firmware stored in a hard disk
655) A compiler breaks the source code into a uniform stream of tokens by lexical analysis
656) Sorting of a file tasks is not performed by a file utility program
657) Floppy disk does not generate a hardware interrupt
658) Ada language is associated with real time processing
659) MS DOS is usually supplied on a cartridge tape
660) BREAK is not an internal DOS command
661) Kernel of MS-DOS software resides in ROM
662) The UNIX operating system (available commercially) has been written in C language
663) MS-DOS has better file security system as compared to UNIX
664) UNIX is only a multiprogramming system
665) The UNIX operating system uses three files to do the task mentioned
666) In UNIX, end-of-file is indicated by typing CTRL D
667) Abacus is said to be invented by Chinese
668) An operating system is necessary to work on a computer

669) The first UNIX operating system, as it was in the development stage, was written in the assembly
language
670) FAST drivers scientific software packages was developed under contract with NASA
671) LEFT () is not a date function
672) FIF editor is a windows based application
673) Graphics is inserted in frame
674) A language translator is best described as a system software
675) The specification of a floppy is identified by TPI
676) DISC () is not a database function
677) In opening menu of word star C OPTION should be selected for protecting a file
678) The most advanced form of ROM is electronically erasable programmable ROM
679) Secondary storage device is needed to store large volumes of data and programs that exceed the
capacity of the main memory
680) MORE sends contents of the screen to an output device

681) NFS stands for N/W file system
682) Main protocol used in internet is TCP/IP
683) We can create a simple web page by using front page express
684) The first line/bar on the word window where the name of the document is displayed is called title
bar
685) The clock frequency of a Pentium processor is 50 MHz
686) The input device that is most likely to be used to play computer games is the joystick
687) Linking the program library with main program is not performed by a file utility program
688) The UNIX operating system has been written in C language
689) BIOS of MS-DOS resides in ROM
690) The sector size of a floppy disk varies from 128 bytes to 1024 bytes
691) A program type of errors is flagged by compilers
692) A floppy diskette is organized according to tracks and sectors
693) In word star, the maximum permissible length of search string is 65
694) C is a third generation high level language
695) A CPU has a 16-bit program counter. This means that the CPU can address 64 K memory locations
696) STR () is used for converting a numeric into a character string
697) BASIC language is normally used along with an interpreter
698) In UNIX, open files are shared between the parent and the child
699) In spite of the extra power needed for refreshing. DRAMs are widely used in computers because of
its lower cost relatives to SRAMs
700) PIF editor belongs to Main group
701) SUM () is not a financial function
702) 98/04/12 cannot be used to enter a date
703) Windows is popular because of its being expensive
704) Personal computers currently sold in India have main memories at an entry level in the range of
megabytes
705) The unit in CPU or processor, which performs arithmetic and logical operations is ALU
706) RAM is volatile
707) The result of arithmetic and logical operations is stored in an accumulator
708) A small amount of memory included in the processor for high speed access is called cache
709) A bus is an electronic track system
710) A co-processor is used to improve the speed of mathematical calculations
711) Intel 80286 belongs to third generation microprocessors
712) A hexadigit can be represented by eight binary bits
713) The number of processes that may running at the same time in a large system can be thousands
714) FORTRAN is a 3GL
715) Root is not an item of the Menu bar
716) Difficult to do what it projects is not considered to be a feature of the spreadsheet
717) While starting the Lotus 1-2-3, the current cell reference is shown at top left hand corner of the
screen

718) Bill gates is the chief of Microsoft
719) Excel office assistant can be made to appear by using F1 key and help menu
720) 9 per page is the max no of showing positioning of per page handouts
721) Alignment buttons are available on formatting toolbar
722) Pico second is 10(to the power of-9)
723) Logo refers to a computer language
724) The most appropriate command to permanently remove all records from the current table is Zap
725) The efficient and well designed computerized payroll system would be on-line real time system
726) The scripts which are designed to receive value from Web users is CGI script
727) WAIS stands for wide assumed information section
728) Modem is used for connecting PC using telephone lines
729) Token bus is the most popular LAN protocol for bus topology
730) Manipulating data to create information is known as analysis
731) A separate document form another program sent along with an E-mail message is known as E-mail
attachment
732) When you boot up a PC portions of the operating system are copied from disk into memory
733) Correcting errors in a program is called debugging
734) A word processor would most likely be used to type a biography
735) A database is an organized collection of data about a single entity
736) Fire fox is a web browser
737) Most of the commonly used personal computers/laptops do not have a command key known as
turnover
738) Full form of USB is known as Universal serial bus
739) The quickest and easiest way in Word to locate a particular word or phrase in a document is to use
the find command
740) Computer sends and receives data in the form of digital signals
741) Icons are graphical objects used to represent commonly used application
742) Most World Wide Web pages contain HTML commands in the language
743) In any window, the maximize button, the minimize button and the close buttons appear on the title
bar
744) Checking that a pin code number is valid before it is entered into the system in an example of data
validation
745) Windows 95 and windows 98 and Windows NT are known as operating systems
746) Information on a computer is stored as analog data
747) A program that works like a calculator for keeping track of money and making budgets is
spreadsheet
748) To take information from one source and bring it to your computer is referred to as download
749) Windows is not a common feature of software applications
750) A toolbar contains buttons and menus that provide quick access to commonly used commands
751) Input device is an equipment used to capture information and commands
752) Most of the commonly available personal computers/laptops have a keyboard popularly known as
QWERTY
753) Editing a document consists of reading through the document youve created thencorrecting
your errors
754) Virtual Box is not a famous operating system
755) Junk e-mail is also called spam
756) DOC is the default file extension for all word documents
757) .bas, .doc and .htm are examples of extensions
758) Codes consisting of bars or lines of varying widths or lengths that are computer readable are known
as a bar code
759) Convenience, speed of delivery, generally and reliability are all considered as the advantages of email
760) E-commerce allows companies to conduct business over the internet
761) The most important or powerful computer in a typical network is network server
762) To make a notebook act as a desktop model, the notebook can be connected to a docking station
which is connected to a monitor and other devices
763) Storage that retains its data after the power is turned off is referred to as non-volatile storage
764) Virtual memory is memory on the hard disk that the CPU uses as an extended RAM
765) To move to the beginning of a line of text, press the home key
766) When sending and e-mail, the subject line describes the contents of the message
767) Microsoft is an application suite
768) Information travels between components on the motherboard through bays
769) One advantage of dial-up internet access is it utilizes existing telephone security
770) Network components are connected to the same cable in the star topology
771) Booting checks to ensure the components of the computer are operating and connected properly
772) Control key is used in combination with another key to perform a specific task
773) Scanner will translate images of text, drawings, and photos into digital form
774) Information on a computer is stored as digital data
775) The programs and data kept in main memory while the processor is using them
776) Storage unit provide storage for information and instruction
777) Help menu button exist at start

778) Microsoft company developed MS Office 2000
779) Charles Babbage is called the father of modern computing
780) Data link layer of OSI reference model provides the service of error detection and control to the
highest layer
781) Optical fiber is not a network
782) OMR is used to read choice filled up by the student in common entrance test
783) A network that spreads over cities is WAN
784) File Manager is not a part of a standard office suite
785) A topology of computer network means cabling between PCs
786) In UNIX command Ctrl + Z is used to suspend current process or command
787) Word is the word processor in MS Office
788) Network layer of an ISO-OSI reference model is for networking support
789) Telnet helps in remote login
790) MS Word allows creation of .DOC type of documents by default
791) In case of MS-access, the rows of a table correspond to records
792) Record maintenance in database is not a characteristic of E-mail
793) In a SONET system, an add/drop multipliers removes noise from a signal and can also add/remove
headers
794) The WWW standard allows grammars on many different computer platforms to show the
information on a server. Such programmers are called Web Browsers
795) One of the oldest calculating device was abacus
796) Paint art is not a special program in MS Office
797) Outlook Express is a e-mail client, scheduler, address book
798) The first generation computers had vacuum tubes and magnetic drum
799) Office Assistant is an animated character that gives help in MSOffice
800) Alta Vista has been created by research facility of Digital Electronic corporation of USA
801) We are shifting towards computerization because technologies help in meeting the business
objectives

802) Spiders search engines continuously send out that starts on a homepage of a server and pursue all
links stepwise
803) Static keys make a network insecure
804) Joy Stick is an input device that cannot be used to work in MS Office
805) Artificial intelligence can be used in every sphere of life because of its ability to think like human
beings
806) To avoid the wastage of memory, the instruction length should be of word size which is multiple of
character size
807) Electronic fund transfer is the exchange of money from one account to another
808) Format menu in MS Word can be use to change page size and typeface
809) Assembly language programs are written using Mnemonics
810) DMA module can communicate with CPU through cycle stealing
811) A stored link to a web page, in order to have a quick and easy access to it later, is called bookmark
812) B2B type of commerce is characterized by low volume and high value transactions in banking
813) Advanced is not a standard MS Office edition

814) Workstation is single user computer with many features and good processing power
815) History list is the name of list that stores the URLs of web pages and links visited in past few days
816) FDDI access mechanism is similar to that of IEEE 802.5
817) MS Office 2000 included a full-fledged web designing software are called FrontPage 2000
818) Macintosh is IBMs microcomputer
819) X.21 is physical level standard for X.25
820) Enter key should be pressed to start a new paragraph in MS Word
821) Main frame is most reliable, robust and has a very high processing power.
822) Formatting of these toolbars allows changing of Fonts and their sizes
823) The ZZ command is used to quit editor after saving
824) The program supplied by VSNL when you ask for internet connection for the e-mail access is pine
825) The convenient place to store contact information for quick, retrieval is address book
826) Digital cash is not a component of an e-wanet
827) For electronic banking, we should ensure the existence and procedures with regard to
identification of customers who become members electronically
828) Jon Von Neumann developed stored-program concept
829) Hardware and software are mandatory parts of complete PC system
830) Firewall is used in PC for security
831) Two rollers are actually responsible for movement of the cursor in mouse
832) In case of a virus getting into computer NORTON will help
833) Tour the server room is to be done by the auditor while internet banking services audit
834) Documentation while developing a software for a Bank is required for auditing
835) Water supply has not become computerized
836) Concurrency control in distributed database supports multi-user access
837) Fifth generation computers are knowledge processors
838) Transistors were first used in 2nd generation computers

839) Intelligence is not a characteristic of a computer
840) A camera is an processing machine
841) To protect organization from virus or attacks all mails sent and received should be monitored, all
messages should be encrypted, E- mails should be used only for official purpose
842) Internet collects millions of people all over the world
843) A computer based information system is a system in which a computer is used to process data to
get information
844) The time between program input and outputs is called execution time
845) Third generations of computers have On-line real time systems
846) MIME is a compressor that packages different formats into SMTP compatible type
847) The earliest software was developed using the waterfall model
848) EDI e- commerce system can handle non monetary documents
849) Collection to tracks on a disk forms spindle
850) A disk where number of sectors are fixed is called hard sectored

851) Half duplex transmission techniques let computer to alternatively send and receive data
852) Multiplexing combines signals from different sources into one and sends on a faster channel
853) Message switcher chooses correct data path for an incoming message and forwards it to relevant
line
854) Speech recognition use thermal sensors along with infrared rays for identification
855) Trojan horse are self replicating malicious code independent of the action of the user, but slow
down the processor on entering a network
856) Generation of PIN in bank ATM would require PIN entered is encrypted
857) Availability, integrity, confidentiality is most necessary for data to be useful
858) Grid is a supercomputer created by networking many small computers
859) A character that changes its value throughout the program is called variables
860) A program coded in programming is done by assembling
861) In write mode of file existing text is replaced by new one
862) When an organization gives contract for development of a software, it has to give data to the
service provider. In such cases, the ownership of data should be with the client/organization that
outsource services
863) Under a centralized organization Intranet be an effective networking tools
864) For optical fiber used in point to point transmission, the repeater spacing is 10-100 km
865) Favorites are accessible from the start menu
866) Task pre-emption, task priority and semaphores are not needed by server program from an
operation system
867) The objective of multiprogramming operating system is to maximize CPU utilization
868) The environment provided to ASP is based on Client/server
869) HUB is layer1 device, central device, dumb device
870) The UNIX, echi command is used to display the message or value of any variable on the screen
871) QAM is used in high speed modem
872) Frame Relay technique uses connection oriented
873) Bipolar always has a non-zero average amplitude
874) In a SONET system, an add/drop multipliers can remove signals from a path

875) The server on the internet is also known as Host
876) For multiple branching in C we use switch statement
877) Web site is a collection of HTML documents, graphic files, audio and video files
878) The first network that initiated the internet was ARPANET
879) In MODEMS a digital signal changes some characteristic of a carrier wave
880) The binary values are represented by two different frequencies in frequency shift keying
881) Messenger mailbox is present in Netscape communicator
882) Switching is a method in which multiple communication devices are connected to one another
efficiently
883) A bridge recognizes addresses of layer 3
884) EPROM is permanent storage device
885) .TIF extension name stands for tagged image format
886) The overhead using BRI is 10 percent of the total data rate
887) In programming languages the key word Void means it does not return any value when finished
888) The keyboard shortcut to restart your computer is Ctrl + Alt + Del
889) FORTRAN is not a programming language
890) The instruction LOAD A is a one address instruction
891) MS-Excel is also known as spread sheet
892) Manchester encoding is used in Ethernet technology
893) The instruction of a program which is currently being executed are stored in main memory
894) In DOS environment, the command used to save the file is ^Z
895) All high level language uses compiler and interpreter
896) In html coding <p> </p> tag is used to display a paragraph
897) In HTML coding, the following attributes color, size, face used in font tag
898) DHTML stands for dynamic hyper text markup language
899) Fiber optics cable supports data rate up to 100 mbps to 2 gbps

900) In Photoshop software we can modify, delete, and edit the image
901) Most common channel used by networks today is telephone lines
902) Sybase SQL server and Microsoft SQL server 7.0 is not an example of RDBMS
903) In programming language, Null point is used to tell end of linked list
904) A technique which collects all deleted space onto free storage list is called garbage collection
905) Node to node delivery of the data unit is the responsibility of the data link layer
906) Insulating material is the major factor that makes co axial cable less susceptible to noise than
twisted pair cable
907) A data communication system covering an area the size of a town or city is MAN
908) Virtual memory system allows the employment of the full address space
909) The basic circuit of ECL supports the OR-NOR logic
910) Micro instructions are kept in control store
911) In HTML coding no shade attribute of HR tag suppresses the shading effect and fields a solid line
912) Internet domains are classified by their functions. In that regard .com represents commercial
913) HTTP in URL stands for hyper text transfer protocol
914) The Nyquist theorem specifies the minimum sampling rate to be twice the bandwidth of a signal
915) Memory allocation at the routine is known as dynamic memory allocation
916) In HTML loading, <BR> tag is used for displaying a new line
917) HTTP protocols are used by internet mail
918) A policy on firewalls needs not ensure that it is logically secured
919) The script which is designed to receive value from the web users is java script
920) GET method and HEAD method is the C91 method of involving a C91 program
921) Analog switched line telephone service is least expensive
922) A toll used to find a synonym or antonym for a particular word is thesaurus
923) In C++ coding, Cout<<tent; is used to display character or strings or numeric screen
924) In this processing, a number of jobs are put together and executed as a group batch processing

925) The process of finding and correcting errors in a program is a process called debugging
926) cn pass command is used to change your password in UNIX system
927) HTML code is always starts with <html> </html>
928) If there are 5 routers and b networks in an internet work using link state routing, there will be 5
routing tables
929) A scripting language similar to HTML and which runs only on a browser is java script
930) By RAID technique, data is stored in several disk units by breaking them into smaller pieces and
storing each piece in separate disk
931) The most popular network protocol whose routing capabilities provide maximum flexibility in an
enterprise wide network is known as TCP/IP
932) New group that enable you to communicate with other Microsoft windows user about issues and
concerns with your computer
933) Analog-digital conversion type needs sampling of a signal
934) In an optical fiber, the inner core is less dense than the cladding
935) Six types of heading are available in HTML
936) RDBMS is an acronym for relational database management system
937) In MS-Word, page layout view is also known as true WYS/WYG
938) In HTML coding, <UL> (Unordered list) is used to give bullets in your document
939) Transmission media are usually categorized as guided or unguided
940) A virtual circuit is the physical connection between an end point and a switch or between two
switches
941) Passing of the frame to next station can happen at a token ring station
942) R-C coupling method is suitable for voltage amplification
943) Normal is not a type of HTML pages
944) In HTML coding <LI> tag is used for denoting items in a list of type <UL>
945) In MS-Word the keyboard shortcut F7 used for spelling and grammar check
946) DBMS is not an operating system
947) HTML is basically used to design web-site
948) In HTML coding, Dynamic web-pages are created in advance of the users request

949) In Dos, primary name of a file can have a maximum of 10 characters
950) du command is used to show file system disk usage in UNIX
951) Maximum length of a text file is 255 characters
952) Frame format of CSMA/CD and Ethernet protocol differ in the block error control
953) On an Ethernet LAN implementation with 10 base 5 the maximum number of segments can be five
954) Overflow condition in linked list may occur when attempting to create a node when linked list is
empty
955) Code segment register is where the microprocessor looks for instruction
956) Web-site is collection of web-pages and Home-page is the very first page that we see on opening of
a web-site
957) The subsystem of the kernel and hardware that cooperates to translate virtual to physical
addresses comprises memory management subsystem
958) A device operating at the physical layer is called a repeater
959) FORTRAN is a mathematically oriented languages used for scientific problems
960) If we want to convert the text which is in small letters to capital letters then select the required
text and press Shift +F3
961) Datagram packet switching uses the entire capacity of a dedicated link
962) In the datagram approach to packet switching, each packet of a message follows the same path
from sender to receiver
963) FDM technique transmits analog signals
964) X.21 protocol consists of only physical level
965) In a dedicated link, the only traffic is between the two connected devices
966) In a start topology, if there are n devices in network, each device has n-1
967) A unique number assigned to a process when the process first starts running PID
968) Modems is necessary for multiplexing
969) In MS-Word WYSIWYG stands for what you see is where you get
970) The primary purpose of shutdown procedure in UNIX system is that all active process may be
properly closed
971) In time- division circuit switching, delivery of data is delayed because data must be stored and
retrieved from RAM
972) Subnet usually comprises layers 1 & 2, layer 1 through 3 of OSI model
973) An image in a web-page can be aligned left and right using HTML coding
974) RFC stands for request for comment
975) Packet filtering firewall and proxy firewall is not a type of firewall
976) Most news readers presents news groups articles in threads
977) The sharing of a medium and its path by two or more devices is called multiplexing
978) Sending messages, voice, and video and graphics files over digital communication link is done by
the method e-mail
979) In a computer network, a computer that can control a group of other computers for sharing
information as well as hardware utilities is known as server
980) Telephone number, zip code is defined as a numeric field
981) In shell programming, tr command is used for character translation
982) Cat text>>output would append a file called test to the end of a file called output
983) In a network with 25 computers, mesh topology would require the more extensive cabling
984) Dialog control is a function of the presentation layer
985) The program which takes user input, interprets it and takes necessary action is shell
986) Most appropriate data structure in C to represent linked list is array
987) Menu bar is usually located below that title bar that provides categorized options
988) Latest version of Microsoft Word is Word XP
989) You save your computer files on disc and in folders
990) when the text automatically goes onto the next line this is called word wrap
991) WYSIWYG is short for what you see is what you get
992) Left justify is the same as align left
993) To put text on the right of the page use the align right button
994) Lotus 1-2-3 is a popular DOS based spreadsheet package
995) 65,535 characters can be typed in a single cell in excel
996) Comments put in cells are called cell tip

997) Getting data from a cell located in a different sheet is called referencing
998) A numeric value can be treated as a label value if it precedes with apostrophe
999) Data can be arranged in a worksheet in an easy to understand manner using auto formatting,
applying styles, changing fonts
1000) An excel workbook is a collection of worksheets and charts
1001) Most manufacturers setup their BIOS to load into upper memory during the boot process
1002) Device drivers loaded in the config.sys file is loaded into the following memory area: Conventional
memory
1003) 40ns memory speeds is the fastest
1004) System software often uses the ROM BIOS
1005) In CMOS setup, if you enable shadowing ROM is copied to RAM
1006) Static variables are local to the block in which they are declared.
1007) During the normal PC boot process, ROM BIOS is active first
1008) During boot-up, the memory test checks and verifies that contiguous memory is installed

1009) 601 error code identifies a floppy drive problem
1010) If you get frequent general protection faults, this could indicate poor quality of memory chips
1011) You are looking at a memory module thought to be a DIMM module. 168 pins would be on a
DIMM module
1012) The system BIOS and ROM chips are called firmware
1013) Extended located above the first 1024K of memory
1014) WRAM type of RAM is normally the fastest
1015) RAM component is used for short-term data storage
1016) A SIMM has 40 pins
1017) RAM provides quickest access to data
1018) Narrowcast linking is not a transmission technology
1019) The data flow diagram is for analyzing requirements of user
1020) The elements of computer processing system are hardware, data, users and procedures

1021) On August 23, 2005 an accounting clerk prepared an invoice dated August 31, 2005. Range check
control can check this
1022) Library management software is for documenting the changes that are made to program and
controlling the version numbers of the programs
1023) Steganography is hiding the data but not necessarily making it invisible and not easily detectable
1024) A computer is an electronic device
1025) An online transaction is transaction done via internet
1026) Using anti-virus software is preventive measure
1027) For security we should consider local data reduction, event correction low resource utilization
1028) OS is not a peripheral of PC
1029) The most common input device used today is keyboard
1030) The third generation of the computer were in 1965-1971
1031) Gateways to allow a network to use the resources of another main frame is a component of
internet
1032) Mouse cannot be shared

1033) EDI means electronic data interface
1034) Mainframes network where a huge compute does all computing and front end PCs are dumb
terminals
1035) A modem that cannot be moved from its position is called fixed modem
1036) A device that receives data from slow speed devices, and transmits it to different locations is
called remote concentrator
1037) Organization would prefer in house development of software to ensure that the development
adhere to defined quality
1038) Actual intelligence is not a feature of PC
1039) Network that uses two OSI protocol layers as against three used in X.25 is a frame relay
1040) Microsoft excel is versatile application and spread sheet program
1041) System flowcharts show relationship that link the input processing and output of the system
1042) To identifying the system to be tested the penetration testing is done
1043) Platform in computer world means computer hardware and operating systems
1044) A character that retains its value during program execution is constants

1046) OMR is used to read choice filled up by a student in common entrance tests
1047) The term remote with respect to network means machine located far off
1048) In two-tier client server architecture the client is usually fat client
1049) The senior management provides the go-ahead approval for the development of projects
1050) Manual data can be put into computer by Scanner
1051) E-mail address is made up of two parts
1052) The normal way to undo a command by pressing the following key combinations together CTRL-Z
1053) The owner of a process is user that invokes the process
1054) In datagram packet switching all the datagrams of a message follow the same channel of a path
1055) X.25 LAPP uses a specific subset of HDLC protocol
1056) Presentation layer of the OSI reference model is concerned with the syntax of data exchanged
between application entities
1057) Edge-triggered D flip flop memory elements uses an PC circuit at its input
1058) Programs that extend the capabilities of a server are C41 scripts
1059) The primary goal of ISDN is the integration of voice services and non-voice services
1060) Flow control in OSI model is done by transport layer
1061) The optical links between any two SONET devices is called a section
1062) A user can get files from another compute on the internet by using FTP
1063) The key fields which are tested by a packet filtering firewall are source IP address , TCP/UDP
source port, destination IP address
1064) The server on the internet is also known as gateway
1065) VBScript can perform calculation of data
1066) In MS-Word, mail merge can be defined writing a letter once and dispatching it to a number of
recipients
1067) Coaxial cables are good for digital transmission and long distance transmission
1068) LRU is a page replacement policy used for memory management
1069) Commercial TV is an example of distributive services with user control
1070) The exact format of frame in case of synchronous transmission depends on whether transmission
scheme is either character oriented or bit oriented
1071) RING topology is least affected by addition/remove of a node
1072) EX-OR gates recognizes only words that have an odd number of 1
1073) To interconnect two homogenous WANs we need a router
1074) Co-axial cables provides data rates over 50 mbps
1075) The outermost orbit of an atom can have a maximum of 8 electrons
1076) The protocol for sharing hypertext information on the world wide web is HTTP
1077) ISDNs basic rate interface (BRI) is also known as 2 D + B
1078) The mode of data transmission of unshielded twisted pair cable is full duplex
1080) Query is used to answer a question about a database
1081) AM and FM are examples of analog to analog modulation

1082) Redundancy is the concept of sending extra bits for use in error detection
1083) The physical layer is concerned with transmission of bits over the physical medium
1084) The number of input lines required for a 8 to 1 multiplexes is 8
1085) The bar-code (rectangular pattern of lines of varying width and spaces) used for automatic
product identification by computer
1086) FSK is most affected by noise
1087) Stack is a LIFO structure
1088) CPU is not an input device of a computer
1089) Program of a computer presented as a sequence of instructions in the form of binary numbers is
called machine language
1090) Possible problems with java scrip can be security or limited graphics and multimedia capabilities
1091) For locating any document on the WWW. There is a unique address known as URL
1092) Design view would use to define a table and specify fields
1093) Traversal process is faster for threaded trees compared with their unthreaded counterparts

1094) The command used to display help on any particular command is man
1095) In C++ programming, the extension of program is .cpp
1096) A generic team that refers to the combination of all commercial transactions executed over
electronic media for the exchange of product and services
1097) In DOS, the command used to create a new file is called copy con
1098) Backup helps you to create a copy of the information on your hard disk and saves original data in
case data on your computer got damaged or corrupted due to malfunctioning of hard-disk
1099) LAN is usually privately owned and links the devices in a single office, building or campus
1100) In justified type of alignment, text gets aligned along both left and right margins
1101) The internal programming language for a particular chip is called machine language
1102) The inner core of an optical fiber is glass and plastic in composition
1103) When a small amount of trivalent impurity is added to a pure-semiconductor it is called P-type
semiconductor
1104) In MS-Access, a table can have one primary key/keys
1105) In DOS, Deltree command is used to delete all the files as well as sub-directories of a directory

1106) Netscape navigator is a web-browser
1107) Multiplexing involves one path and one channel
1108) Table, form, queries, reports, macros, modules are objects in an access database
1109) The clown command in UNIX changes home directory of a user
1110) BCD stands for binary coded decimal
1111) When we run a program in HTML coding, notepad is used as backend and internet-explorer works
as front end
1112) If the M bit in X.25 standard is set to 1, it means that thee is more than one packet
1113) The modem is a device that connects n input stream outputs
1114) Array is linear data structure
1115) A T.V. broadcast is an example of simplex transmission
1116) Search engine will search its database to find items whose tent contains all or at least one of the
words given to it
1117) In UNIX, command ! $ is used to repeat entire less command line

1118) PCM is an example of analog to digital
1119) A simple protocol used for fetching an e-mail from a mailbox is POP 3
1120) For a small web site, one needs to buy space from the ISP
1121) An operating system that acts as an intermediary between user and computer hardware
1122) Attair, the worlds first personal computer, was introduced in the year 1979
1123) Half duplex data flows in both directions, but any one direction at the time
1124) Ring requires a central controller or hub
1125) The OSI model consists of seven layers
1126) The main job of one of the following is to allocate CPU to processes scheduler
1127) 10,500 valid Min & Max zoom sizes in MS office
1128) Before printing a document you should always use print preview
1129) Excel XP is the latest version of excel
1130) A worksheet can have a maximum of 256 number of rows
1131) Character is not a valid data type in Excel
1132) Formula bar in an Excel window allows entering values and formulas
1133) Direct memory access is a technique for transferring data from main memory to a device without
passing it through the CPU
1134) 5 30-bit SIMMS are required to populate a bank on a 486 system that has a 32-bit data bus
1135) SRAM uses a clock to synchronize a memory chips input and output signal
1136) Cycle-stealing type of DMA transfer will operate when a CPU is operating
1137) A series 100 POST error code indicates a problem with the system board
1138) You have an old PC that you decide to upgrade with a 1 gig IDE hard drive. You find that you cant
configure CMOS to see the entire hard drive. The best you can do is 540 meg. Then use a device driver
that makes the bios see the drive as
1139) When SHADOWING is enabled in computers BIOS Instructions stored in various ROM chips are
copied into
1140) POST stands for power on self test
1141) Checking the hard disk while running Windows 3.1, you discover a very large file called
396SPART.PAR. That file is windows permanent swap file
1142) CMOS contains the computer BIOS and maintains its data with the use of a battery for periods
when the system is powered down
1143) TSR stands for terminate and stay
1144) LAN is not an inter network
1145) Memory is temporary and storage is permanent
1146) Echo checking cannot assure data accuracy in an application
1147) Focus on manual records is not necessary for computerization system in a bank
1148) Permanent establishment, residence-based, Income based classification are the approaches used
to tax online transactions
1149) Computer of computer communication for business transactions is called EDI
1150) Client-server computing is used in Network multi-media
1151) Back up of files is taken for security
1152) Operating system is not a software category
1153) Computer program looking normal but containing harmful code is infected by Trojan horse
1154) Private key is used to append a digital signature
1155) Most dangerous risk in leaking of information is ignorance about the existence of risk
1156) IMAP (Internet message access protocol) takes care of E-mail at client side
1157) The CPU has control unit, arithmetic-logic unit and primary storage
1158) 1st generation computer is the bulkiest PC
1159) E-R diagram represents relationship between entities of system
1160) User is technically least sound
1161) Minicomputers is not there during fourth generation computer
1162) Microchip is unique to a smartcard
1163) Internet was started as network for defences forces of America
1164) A program permanently stored in hardware is called firmware
1165) Taking back-up of a file against crash is a curative measure

1166) Simplex transmission technique permits data flow in only one direction
1167) Front end processor relieves the host computer from tedious jobs and does them itself
1168) Software cant be touched
1169) Physical access to a database can be altered by hiring procedure
1170) The sound signal carried by telephone line is analog
1171) All decisions for LAN are taken by the IT steering committee
1172) An input device gives data to a computer
1173) Correction of program is done by debugging
1174) File transfer is the function of the application layer
1175) A policy on firewalls need not ensure that is logically secured
1176) A modem performs modulation, demodulation, data compression
1177) Personnel security does not fall under the category of operations to be performed during
development of software
1178) CASE tool cannot help with understanding requirements
1179) UPC cannot be used for source of data automation
1180) The banks are MICR device to minimize conversion process
1181) MP3 files cannot be navigated using ClipArt browser
1182) Close option in File pull-down menu is used to close a file in MSWord
1183) 3 is the size of a standard floppy disc
1184) When entering in a lot of text in capitals you should use the caps lock key
1185) Files created with Lotus 1-2-3 have an extension 123
1186) Contents, objects, Scenarios of a worksheet can be protected from accidental modification
1187) Device drivers that are loaded in windows 3.X are loaded into the sytem.ini file
1188) 30 pin SIMMs, 72 pin SIMMs, 168 pin DIMMs types of RAM sockets can be seen on mother
boards
1189) The Power on self test determines the amount of memory present, the date/time, and which
communications ports and display adapters are installed in a microcomputer
1190) Virtual memory refers to using a file on the hard disk to simulate RAM
1191) BIOS (ROM) is considered firmware
1192) A population application of computer networking is the WWW of newsgroup called Netnews
1193) a = 10; export a is a valid command sequence in UNIX
1194) Set date will you give in UNIX to display system time
1195) Circuit switched network networks requires that all channels in a message transmission path be of
the same speed
1196) The Vi program available under UNIX can be created to open a virtual terminal
1197) A 4-bit ring counter is initially loaded with 1001
1198) The standard defined for fiber optics is 802.8
1199) Digitizers can be converted from dumb to smart through the addition of a microprocessor
1200) The extension of database file is given by dbf
1201) VRML code is based on Unicode
1202) Use net discussion groups have their own system of organization to help you find things just as
internet excel
1203) Http protocol is used for WWW
1204) Protocol conversion can be handled by gateway
1205) In ISDN teleservices, the network can change or process the contents of data
1206) A longer instruction length may be -1024 to 1023
1207) A microprocessor is a processor with a reduced instruction set and power requirement
1208) The term server refers to any device that offers a service to network users
1209) Using HTML, Front page, DHTML we can make web-site
1210) Usually security in a network is achieved by cryptography
1211) PSTN stands for public switched telephone network
1212) A thyratron cannot be used as a amplifier
1213) An input device conceptually similar to mouse is joystick
1214) Netscape navigator and other browsers such as the internet explorer are available free on the
internet

1215) In MS-logo Bye command is used to come out from that screen
1216) In C++ programming, the command to save the program file is F3
1217) Data lines which provide path for moving data between system modules are known as data bus
1218) Bubble sort technique does not use divide and conquer methodology
1219) The OSI model shows how the network functions of computer to be organized
1220) A 8 bit microprocessor must have 8 data lines
1221) A protocol that permits the transfer of files between computer on the network is FIP
1222) A data structure, in which an element is added and removed only from one end is known as stack
1223) In linked list, the successive elements must occupy contiguous space in memory
1224) In OSI model reference, layer 2 lies in between the physical layer and the network layer
1225) In synchronous TDM, for n signal sources, each frame contains at least n slots
1226) Mouse and joystick are graphic input devices
1227) In linked list, a node contains at least node number, data field
1228) Gopher is not a web browser
1229) Device drivers controls the interaction between the hardware devices and operating systems
1230) The shortest path in routing can refer to the least expensive path
1231) An ordinary pen which is used to indicate locations on the computer screen by sensing the ray of
light being emitted by the screen, is called light pen
1232) Netiquettes are some rules and regulations that have to be followed by users
1233) Gateway uses the greatest number of layers in the OSI model
1234) A set of standards by which servers communicate with external programs is called common
gateway interface
1235) UNIVAC is a computer belonging to third generation
1236) API allows a client/server relationship to be developed between an existing host application and a
PC client
1237) Semi-insulator is a substance which has resistivity in between conductors and insulators
1238) Multi vibrator is a two stage amplifier with output of one feedback to the input of the other
1239) Macro is used to automate a particular task or a series of tasks
1240) Internet is network of networks
1241) A set of devices or combination of hardware and software that protects the systems on one side
from system on the other side is firewall
1242) Simple, transparent, multi post are bridge types
1243) When bandwidth of medium exceeds the required bandwidth of signals to be transmitted we use
frequency division multiplexing
1244) Direct or random access of element is not possible in linked list
1245) In Dos, the Label command is used to display the label of disk
1246) At the lower end of electromagnetic spectrum we have radio wave
1247) In Word, Ctrl + Del combination of keys is pressed to delete an entire word
1248) Plotters are very useful in applications such as computer aided design
1249) Web browser is a type of network application software
1250) 65535 characters can be typed in a single cell in Excel
1251) Overtime analysis is useful for formulating personnel policies and derived form the payroll system
1252) Multiple worksheets can be created and used at a time
1253) UNIX is both time sharing and multiprogramming system
1254) Floppy Disk is universal portable and inexpensive but has its own limitation in terms of storage
capacity and speed
1255) Personal computers currently sold in India have main memories at an entry level in the range of
megabytes
1256) UNIX has better security for files relative to MS-DOS
1257) The UNIX operating system has been written in C language
1258) Syntax errors is flagged by compilers
1259) PARAM is an example of super computer
1260) Mother board holds the ROM, CPU, RAM and expansion cards
1261) CD-ROM is as a magnetic memory
1262) The binary number system has a base 2
1263) GUI is used as an interface between software and user
1264) E-mail is transaction of letters, messages and memos over communications network
1265) Device drivers are small , special purpose programs
1266) LAN refers to a small, single site network
1267) One megabyte equals approximately 1 million bytes
1268) Magnetic tape is not practical for applications where data must be quickly recalled because tape is
a sequential access medium
1269) User id, URI and time stamp is not used by organization when a user visits its site
1270) DBRM takes care of storage of data in a data base
1271) Plotters give the highest quality output
1272) Encrypting file system features of windows XP professional operating system protects the data of
a user, even if the computer is shared between users
1273) Loading is not required for high level language program before it is executed
1274) Top bottom approach cannot be the measure of network traffic
1275) Devices such as magnetic disks, hard disks, and compact disks, which are used to store
information, are secondary storage devices
1276) Various input and output devices have a standard way of connecting to the CPU and Memory.
These are called interface standards
1277) The place where the standard interfaces are provided to connect to the CPU and Memory is
known as Port
1278) Binary numbers are positional numbers
1279) The base of the hexadecimal system is sixteen
1280) Display capabilities of monitor are determined by adapter card
1281) Mouse has a use in graphical user interface and applications as input device
1282) Drum plotter, flat bed plotter, graphic display device is an output device
1283) The time taken to write a word in a memory is known as write name
1284) 1 MB is equivalent to 2 (to the power of 20 bytes)
1285) A memory cell, which does not loose the bit stored in it when no power is supplied to the cell, is
known as non-volatile cell
1286) Magnetic surface recoding devices used in computers such as hard disks, floppy disks, CD-ROMs
are called secondary / auxiliary storage devices

1287) The electronic circuits / devices used in building the computer that executes the software is
known as hardware
1288) Assembler is a translator which translates assembly language program into a machine language
program
1289) Interpreter is a translator which translates high level language program into a machine language
program
1290) Machine language programs are machine dependent
1291) The programs written in assembly language are machine independent
1292) High level languages are developed to allow application programs, which are machine
independent
1293) The Vacuum tubes are related to first generation computers
1294) Mark I was the first computer that used mechanical switches
1295) First generation computers relied on machine language to perform operations, and they only
solve one problem at a time
1296) In first generation computers input was based on punched cards
1297) In second generation computers input was based on print outs

1298) Vacuum tube generates more energy and consumes more electricity
1299) Second generation computers moved from cryptic binary machine language to symbolic, or
assembly languages which allowed programmers to specify instructions in words
1300) Most electronic devices today use some form of integrated circuits placed on printed circuit
boards thin pieces of bakelite or fiberglass that have electrical connections etched onto them is called
mother board
1301) The operating system, which allowed the device to run many different applications at one time
with a central program that monitored the memory was introduced in third generation computers
1302) In third generation computers, users interacted through keyboards and monitors
1303) The fourth generation computers saw the development of GUIs, the mouse and handheld devices
1304) First computers that stored instructions in memory are second generation computers
1305) In second generation computers transistors replaced vacuum tubes
1306) The micro processor was introduced in fourth generation computer
1307) Integrated Circuits (IC) are introduced and the replacement of transistors started in third
generation computers
1308) Fifth generation computing is based on artificial intelligence

1309) Assembly language is low-level language
1310) In assembly language mnemonics are used to code operations, alphanumeric symbols are used for
address, language lies between high-level language and machine language
1311) The computers secondary memory is characterized by low cost per bit stored
1312) Acknowledgement from a computer that a packet of data has been received and verified is known
as ACK
1313) Acoustic coupler is a communications device which allows an ordinary telephone to be used with
a computer device for data transmission
1314) ALGOL is a high-level language
1315) A high level programming language named after Ada Augusta, coworker with Charles Babbage is
Ada
1316) Adder is a logic circuit capable of forming the sum of two or more quantities
1317) To identify particular location in storage area one have a address
1318) A local storage register in the CPU which contains the address of the next instruction to be
executed is referred as address register
1319) A sequence of precise and unambiguous instructions for solving a problem in a finite number of
operations is referred as algorithm

1320) PC/AT is an example of Bi-directional keyboard interface
1321) DIMM is not an I/O bus
1322) PCI bus is often called as mezzanine bus
1323) 8088 is an original IBM PC inter CPU chip
1324) 80386 is a 32-bit processor
1325) A Pentium or Pentium pro board should run at 60 or 66 MHZ
1326) The maximum bandwidth of EISA bus is 33 M/sec
1327) A computer system runs millions of cycles per second so that speed is measured in MHz
1328) Heat sink is the metal device that draws heat away from an electronic device
1329) Pentium chip has 64 bit & 32 bit registers
1330) A mother board should contain at least 4 memory sockets
1331) The 1.2 MB drive spin at 360 rpm
1332) Intensity of sound is called amplitude
1333) A single zero bit is called starting bit
1334) PnP stands for plug and play
1335) A computers systems clock speed is measured as frequency
1336) Maximum RAM of XT type is 1 M
1337) The data in 8 bit bus is sent along 8 wires simultaneously in parallel
1338) The bus is simple series of connection that carry common signals
1339) Mainly memories are divided into two types they are logical memories and physical memories
1340) System based on the new Pentium II processor have the extended memory limit of 4G
1341) Full form of EMS memory is excluded memory specification
1342) The 286 & 386 CPU have 24 address lines
1343) Bus has both the common meaning and computer meaning
1344) Data on a floppy disk is recorded in rings called tracks

1345) A group of characters that have a predefined meaning database
1346) In a spreadsheet a cell is defined as the intersection of a row and column
1347) Document file is created by word processing programs
1348) Real time is not a type of error
1349) The mouse device drivers, if loaded in the config.sys file, its typically called mouse.sys
1350) Peer to peer means computer to computer
1351) RLL refers to run length limits
1352) The most important aspect of job scheduling is the ability to multiprogramming
1353) Modem is a modulator-demodulator system
1354) A data communication system requires terminal device, communication channel, protocols
1355) The start button appears at the lower left of the screen
1356) Windows is GUI
1357) TIF stands for tagged image format
1358) The first network that initiated the internet is ARPANET
1359) In modems a digital signal changes some characteristic of a carrier wave
1360) Favorites are accessible from the start menu
1361) The virus is a software program
1362) BANKS is not memory chip
1363) We use RAM code to operate EGA
1364) A scanner is attached to LPT or SCSI host adapter port
1365) RAM bus is not a bus
1366) Processor bus is the fastest speed bus in system
1367) Resolution is the amount of detail that a monitor can render
1368) Sound blaster is a family of sound cards sold by creative labs
1369) MIDI is a family of sound cards sold by creative labs

1370) MIDI is a standard for connecting musical instruments to PCs
1371) By connecting a MIDI cable to the joystick port you can connect your PC to a MIDI device
1372) MPC stands for multimedia personal computer
1373) Video cards, video and graphics card are example of video & audio
1374) The display technology used by monitor is CRT
1375) On a LAN each personal computer is called workstations
1376) The CPU is the next most important file server after the hard disk
1377) It is best to use the game adaptor interface on the sound card and disable any other on system
1378) Mouse port uses keyboard controller
1379) The number of sectors per track in 1.44 MB floppy disk is 18
1380) Pentium II system can address 62 G memory
1381) The mouse was invented by Englebart
1382) The servers network adapter card is its link to all the work stations on LAN
1383) Magnetic drives such as floppy and hard disk drives operate by electro magnetism
1384) Clock timing is used to determine that start and end of each bit cell
1385) Head designs are of 4 types
1386) Latency is the average time that it takes for a sector to be a available after the heads
1387) Each sector is having 512 bytes
1388) DDD means digital diagnostic disk
1389) PC Technical is written in assembly language and has direct access to the systems hardware for
testing
1390) Check it pro deluxe gives detailed information about the system hardware
1391) The last 128k of reserved memory is used by mother board
1392) There are 80 cylinders are there for an 1.44 m floppy
1393) IBM changed ROM on the system to support key boards
1394) +12 V signal for disk drive is used for power supply

1395) CD ROMs are single sided
1396) The storage capacity of a CDROM is 650 MB
1397) XGB has 2048 K graphic memory
1398) Most sound boards generate sounds by using fm synthesis
1399) Microscope helps you trouble shoot PS/2 system
1400) General purpose of diagnostic program run in batch mode
1401) The routing of data elements are called bits
1402) Operating system often called as kernel
1403) IPC stands for inter process communication
1404) Many programming errors are detected by the hardware
1405) When a process exists, he operating system must free the disk space used by its memory image
1406) Buffering attempts to keep both CPU and I/O device busy all the time
1407) Software is not an example of file mapping
1408) The most important aspect of job scheduling is the ability to multiprogramming
1409) There are two types of floppy disks
1410) System calls can be grouped into three major categories
1411) FIFO stands for first in first out
1412) In the two-level directory structure, each user has its own user file directory
1413) The number of bytes in a page is always a power of 2
1414) A process that does not determinate while the operating system is functionary is called dynamic
1415) The three main types of computer programming languages are machine language, assembly
language, high level language
1416) There are 2 types of processor modes
1417) An input device is an electromechanical device that generates data for a computer to read
1418) The first implementation of UNIX was developed at Bell Telephone Laboratories in the early 1970
1419) A connection less protocol is more dynamic

1420) The throughput is a measure of work for processor
1421) COBOL stands for common business oriented language
1422) Reset button is used to do cool booting
1423) In MS-EXCEL the no of rows, no of columns are 16384,256
1424) FOXPRO is a package and programming language
1425) Mostly used date format in computer is MM-DD-YY
1426) Reservation of train ticket uses real time mode of processing
1427) A floppy disk is a thin plastic disc coated with magnetic oxide
1428) In binary addition 1+1 = 1
1429) Universal building blocks of a computer system are NAND & NOR
1430) In COBOL programming characters length per line is 64
1431) In FOXPRO, the maximum fields in a record are 128
1432) QUIT command is used to come out of FOXPRO
1433) Count command is used to count the specified records in a file with or without condition
1434) F1 key is pressed for help in FOXPRO
1435) One Giga byte = 1024 mega bytes
1436) Small scale integration chip contains less than 12 gates
1437) The most common monitor sizes are 14, 15, 17
1438) IBM PC and DOS has BIOS support for 3 LPT ports
1439) EDO stands for external data organizer
1440) The number of wires in IDE Hard Disk cable are 16
1441) If the data transfer rate is 150 k/sec then it is called single speed
1442) A typical buffer for a CD-ROM drives 156 K
1443) The 2.88 M floppy drives have 36 no of sectors
1444) 1 MB is equal to 2 (to the power of 10 KB)

1445) ROM is a primary storage device
1446) The process of loading and starting up DOS is called booting
1447) The Dos Prompt is mainly C:\>
1448) C:\>ver is used for displaying current version name
1449) Intel 440x Natoma is an example for mother board chipset
1450) Direct memory access channels are used by medium speed communication devices
1451) The 486 Sx chip is twice as fast as a 386 Dx with same clock speed
1452) On a 286 or 386 sx system, the extended memory limits 16M
1453) The 385 Dx, 386 CPU have 32 address lines
1454) Shared memory does require 16 K of VMA space
1455) DIMMs are 168 pin modules
1456) The function of +12V power supple is to run disk drive motors and also cooling fans
1457) SPS generally referred as rechargeable batteries
1458) Parallel interface is not the interfaces that can be used to connect the mouse to the computer
1459) The mouse interrupts usually occurs if the system uses a mouse port
1460) The video adapter BIOS handles communication between the Video Chipset & Video Ram
1461) Pentium pro CPUs have full of 323, they can track of 44 G of memory in address lines
1462) The processor bus is the communications path way between CPU and immediate support chip
1463) VL-bus can move data 32 bit at a time
1464) A modem attached to system on COM ports
1465) Horizontal scan refers to the speed at which the electron beam across the screen
1466) RGB monitor display 80 column text
1467) When transition changes from negative to positive the head would detect positive voltage spike
1468) Animation means to make still picture, move and talk like in cartoon pictures
1469) Analogues: The use of a system in which the data is of a continuously variable physical quantity
such as voltage or angular person

1470) Animation: A simulation of movement created by displaying a series of pictures or frames
1471) Application means a piece of software designed to meet a specific purpose
1472) Active X is a model for writing programs. Active X technology is used to make interactive Web
pages that look and behave like computer programs, rather than static pages. With Active X, users can
ask or answer questions, use push buttons, and interact in other ways with the web page
1473) Batch processing is a technique in which a number of similar items or transactions are processed
in groups or batches during a machine run
1474) BIS: Bureau of Indian Standards. It is a national organization of India to define standards
1475) Browser is a link between the computer and the internet. Its actually a program that provides a
way to look in the hand interact with all information on the internet. A browser is a client program that
uses the Hypertext Transfer Protocol (http) to make requests of Web servers throughout the Internet on
behalf of the browser user.
1476) CIO (Chief information officer) : The senior executive in a company responsible for information
management and for delivering IT services
1477) Client/server architecture: A type of network in which computer processing is distributed among
many individual PCs and a more powerful, central computer clients can share files and retrieve data
stored on the server
1478) Collaborative software: Groupware, such as Lotus Notes or Microsoft Exchange
1479) Computer-Aided design: Refers to any computer-enabled method of design also called computerassisted design.
1480) Commuter: A group of electronic device used for performing multipurpose tasks
1481) Channel: It consists of controller card, interface cable and power supply
1482) CORBA: CORBA is the acronym for Common Object Request Broker Architecture
1483) CBT: Computer based training
1484) Certification: Skills and knowledge assessment process
1485) Computer Crime: The act of stealing, cheating or otherwise defrauding an organization with the
use of a computer
1486) Cyber caf: Caf offering internet browsing facility
1487) Cryptography: Method used to protect privacy and security on the internet
1488) DBMS: An acronym for the database management system. A program that maintains and controls
the access to collection of related information in electronic files
1489) Data: Facts coded and structured for subsequent processing, generally using a computer system
1490) Digital signature: Encrypted signature used for providing security for the messages/data
transferred through the internet

1491) Digital computer: A device that manipulates discrete data and performs arithmetic and logic
operations on these data
1492) Data transmission: The movement of data from one location of storage to another. If the locations
are geographically far away, generally done via satellites.
1493) Disk Mirroring: The data is written on two or more hard disks simultaneously over the same
channel
1494) Disk Duple Xing: The data is written on two or more hard disks simultaneously over the different
channel
1495) Dumb Terminals: Hardware configuration consisting of a keyboard and monitor that is capable of
sending and receiving information but has no memory or processing capabilities.
1496) Download: Process of transferring a file system from one system to another
1497) E-commerce: Business transactions conducted over extranets or the internet
1498) Enterprise, resource planning: An integrated system of operation application combining logistics,
production, contract and order management, sales forecasting and financial and HR management
1499) Electronic data interchange (EDI) : Electronic transmission or documents through point to point
connections using a set of standard forms, message and data elements, this can be via leased lines
private networks or the internet
1500) Data processing: It is a method concerning with the systematic recording, arranging, filing,
processing and dissemination of facts of business
Basic Programming Concepts

CS10001: Programming & Data Structures
Pallab Dasgupta
Professor, Dept. of Computer Sc. & Engg.,
Indian Institute of Technology Kharagpur
Dept. of CSE, IIT KGP
Some Terminologies
Algorithm / Flowchart
A stepstep-by
by--step procedure for solving a particular problem.
Independent of the programming language.
Program
A translation of the algorithm/flowchart into a form that can be
processed by a computer.
Typically written in a highhigh-level language like C, C++, Java, etc.
Variables and Constants
Most important concept for problem solving using computers
All temporary results are stored in terms of variables

The value of a variable can be changed.
The value of a constant do not change.
Where are they stored?

In main memory.
Contd.
How does memory look like (logically)?

As a list of storage locations, each having a unique address.
Variables and constants are stored in these storage locations.
A variable is like a bin
The contents of the bin is the value of the variable
The variable name is used to refer to the value of the variable
A variable is mapped to a location of the memory, called its
address
Memory map
Address 0
Address 1
Address 2
Address 3
Address 4
Address 5
Address 6
Address N-1
Every variable is
mapped to a particular
memory address
Variables in Memory
Instruction executed
T
i
m
e
Variable X
X = 10
10
X = 20
20
X=X+1
21
X=X*5
105
Variables in Memory (contd.)

Variable
Instruction executed
T
i
m
e
X = 20
20
Y = 15
20
15
X=Y+3
18
15
Y=X/6
18
Data Types
Three common data types used:
Integer :: can store only whole numbers
Examples: 25, -56, 1, 0
Floating
Floating--point :: can store numbers with fractional values.
Examples: 3.14159, 5.0, -12345.345
Character :: can store a character

Examples: A, a, *, 3, , +
Data Types (contd.)

How are they stored in memory?
Integer ::
16 bits
32 bits
Float ::
32 bits
64 bits
Actual number of bits vary from

one computer to another
Char ::
8 bits (ASCII code)
16 bits (UNICODE, used in Java)
Problem solving
Step 1:
Clearly specify the problem to be solved.
Step 2:
Draw flowchart or write algorithm.
Step 3:
Convert flowchart (algorithm) into program code.
Step 4:
Compile the program into object code.
Step 5:
Execute the program.
Flowchart: basic symbols
Computation
Input / Output
Decision Box
Start / Stop
Contd.
Flow of
control
Connector
Example 1: Adding three numbers
START
READ A, B, C
S=A+B+C
OUTPUT S
STOP
Example 2: Larger of two numbers

START
READ X, Y
YES
IS
X>Y?
NO
OUTPUT X
OUTPUT Y
STOP
STOP
Example 3: Largest of three numbers

START
READ X, Y, Z
YES
IS
X > Y?
Max = X
YES
OUTPUT Max
STOP
NO
Max = Y
IS
Max > Z?
NO
OUTPUT Z
STOP
Example 4: Sum of first N natural numbers

START
READ N
SUM = 0
COUNT = 1
SUM = SUM + COUNT
COUNT = COUNT + 1
NO
IS
COUNT > N?
YES
OUTPUT SUM
STOP
Example 5: SUM = 12 + 22 + 32 + N2
START
READ N
SUM = 0
COUNT = 1
SUM = SUM + COUNT COUNT
COUNT = COUNT + 1
NO
IS
COUNT > N?
YES
OUTPUT SUM
STOP
Example 6: SUM = 1.2 + 2.3 + 3.4 + to N terms

START
READ N
SUM = 0
COUNT = 1
SUM = SUM + COUNT (COUNT + 1)
COUNT = COUNT + 1
NO
IS
COUNT > N?
YES
OUTPUT SUM
STOP
Example 7: Computing Factorial

START
READ N
PROD = 1
COUNT = 1
PROD = PROD * COUNT
COUNT = COUNT + 1
NO
IS
COUNT > N?
YES
OUTPUT PROD
STOP
Example 8: Computing ex series up to N terms

START
READ X, N
TERM = 1
SUM = 0
COUNT = 1
SUM = SUM + TERM
TERM = TERM * X / COUNT
COUNT = COUNT + 1
NO
IS
COUNT > N?
YES
OUTPUT SUM
STOP
Example 8: Computing ex series up to 4 decimal places

START
READ X, N
TERM = 1
SUM = 0
COUNT = 1
SUM = SUM + TERM
TERM = TERM * X / COUNT
COUNT = COUNT + 1
NO
IS
TERM < 0.0001?
YES
OUTPUT SUM
STOP
Example 10: Roots of a quadratic equation
ax2 + bx + c = 0
TRY YOURSELF
Example 11: Grade computation

MARKS 90
89 MARKS 80
79 MARKS 70
69 MARKS 60
59 MARKS 50
49 MARKS 35
34 MARKS

Ex
A
B
C
D
P
F
Grade Computation (contd.)

START
READ MARKS
MARKS 90?
YES
OUTPUT Ex
STOP
NO
MARKS 80?
YES
OUTPUT A
STOP
NO
MARKS 70?
YES
OUTPUT B
STOP
NO
NO
MARKS 60?
YES
MARKS 50?
YES
NO
MARKS 35?
NO
YES
OUTPUT C
OUTPUT D
OUTPUT P
OUTPUT F
STOP
STOP
STOP
STOP
NetworkSecurity:History,Importance,andFuture
UniversityofFloridaDepartmentofElectricalandComputerEngineering
BhavyaDaya
ABSTRACT
ofintellectualpropertythatcanbeeasilyacquired
throughtheinternet.
There are currently two fundamentally different

networks,datanetworksandsynchronousnetwork
comprisedofswitches.Theinternetisconsidereda
data network. Since the current data network
consists of computerbased routers, information
can be obtained by special programs, such as
Trojan horses, planted in the routers. The
synchronous network that consists of switches
does not buffer data and therefore are not
threatened by attackers. That is why security is
emphasizedindatanetworks,suchastheinternet,
andothernetworksthatlinktotheinternet.
The vast topic of network security is analyzed by

researchingthefollowing:
1. Historyofsecurityinnetworks
2. Internet architecture and vulnerable
securityaspectsoftheInternet
3. Types of internet attacks and security
methods
4. Securityfornetworkswithinternetaccess
5. Current development in network security
hardwareandsoftware
Based on this research, the future of network

security is forecasted. New trends that are
emerging will also be considered to understand
wherenetworksecurityisheading.
Network security has become more important to

personal computer users, organizations, and the
military. With the advent of the internet, security
becameamajorconcernandthehistoryofsecurity
allowsabetterunderstandingoftheemergenceof
security technology. The internet structure itself
allowed for many security threats to occur. The
architecture of the internet, when modified can
reducethepossibleattacksthatcanbesentacross
the network. Knowing the attack methods, allows
for the appropriate security to emerge. Many
businessessecurethemselvesfromtheinternetby
means of firewalls and encryption mechanisms.
The businesses create an intranet to remain
connected to the internet but secured from
possiblethreats.
Theentirefieldofnetworksecurityisvastandinan
evolutionary stage. The range of study
encompasses a brief history dating back to
internetsbeginningsandthecurrentdevelopment
in network security. In order to understand the
research being performed today, background
knowledgeoftheinternet,itsvulnerabilities,attack
methods through the internet, and security
technology is important and therefore they are
reviewed.
INTRODUCTION
The world is becoming more interconnected with

the advent of the Internet and new networking
technology. There is a large amount of personal,
commercial,military,andgovernmentinformation
onnetworkinginfrastructuresworldwide.Network
security is becoming of great importance because
1. NetworkSecurity
Systemandnetworktechnologyisakeytechnology
forawidevarietyofapplications.Securityiscrucial
4. Integrity Ensure the message has not

beenmodifiedintransit
5. NonrepudiationEnsuretheuserdoesnot
refutethatheusedthenetwork
to networks and applications. Although, network

security is a critical requirement in emerging
networks, there is a significant lack of security
methodsthatcanbeeasilyimplemented.
There exists a communication gap between the

developers of security technology and developers
of networks. Network design is a welldeveloped
process that is based on the Open Systems
Interface (OSI) model. The OSI model has several
advantages when designing networks. It offers
modularity,
flexibility,
easeofuse,
and
standardization of protocols. The protocols of
different layers can be easily combined to create
stacks which allow modular development. The
implementationofindividuallayerscanbechanged
later without making other adjustments, allowing
flexibility in development. In contrast to network
design, secure network design is not a well
developed process. There isnt a methodology to
manage the complexity of security requirements.
Secure network design does not contain the same
advantagesasnetworkdesign.
An effective network security plan is developed

withtheunderstandingofsecurityissues,potential
attackers,neededlevelofsecurity,andfactorsthat
makeanetworkvulnerabletoattack[1].Thesteps
involved in understanding the composition of a
secure network, internet or otherwise, is followed
throughoutthisresearchendeavor.
To lessen the vulnerability of the computer to the

networktherearemanyproductsavailable.These
tools are encryption, authentication mechanisms,
intrusiondetection, security management and
firewalls. Businesses throughout the world are
using a combination of some of these tools.
Intranetsarebothconnectedtotheinternetand
reasonably protected from it. The internet
architecture itself leads to vulnerabilities in the
network. Understanding the security issues of the
internet greatly assists in developing new security
technologies and approaches for networks with
internetaccessandinternetsecurityitself.
The types of attacks through the internet need to

also be studied to be able to detect and guard
against them. Intrusion detection systems are
established based on the types of attacks most
commonly used. Network intrusions consist of
packetsthatareintroducedtocauseproblemsfor
thefollowingreasons:
Toconsumeresourcesuselessly
To interfere with any system resources
intendedfunction
To gain system knowledge that can be
exploitedinlaterattacks
The last reason for a network intrusion is most

commonlyguardedagainstandconsideredbymost
as the only intrusion motive. The other reasons
mentionedneedtobethwartedaswell.
When considering network security, it must be

emphasized that the whole network is secure.
Network security does not only concern the
security in the computers at each end of the
communicationchain.Whentransmittingdatathe
communication channel should not be vulnerable
to attack. A possible hacker could target the
communicationchannel,obtainthedata,decryptit
andreinsertafalsemessage.Securingthenetwork
isjustasimportantassecuringthecomputersand
encryptingthemessage.
When developing a secure network, the following

needtobeconsidered[1]:
1. Accessauthorizedusersareprovidedthe
means to communicate to and from a
particularnetwork
2. ConfidentialityInformationinthenetwork
remainsprivate
3. Authentication Ensure the users of the
networkarewhotheysaytheyare
2
Typical security currently exists on the computers

connected to the network. Security protocols
sometimesusually appearas partof a single layer
oftheOSInetworkreferencemodel.Currentwork
is being performed in using a layered approach to
secure network design. The layers of the security
model correspond to the OSI model layers. This
security approach leads to an effective and
efficient design which circumvents some of the
commonsecurityproblems.
The relationship of network security and data

security to the OSI model is shown in Figure 1. It
can be seen that the cryptography occurs at the
applicationlayer;thereforetheapplicationwriters
are aware of its existence. The user can possibly
choose different methods of data security.
Network security is mostly contained within the
physical layer. Layers above the physical layer are
also used to accomplish the network security
required [2]. Authentication is performed on a
layerabovethephysicallayer.Networksecurityin
2. DifferentiatingDataSecurityand
thephysicallayerrequiresfailuredetection,attack
NetworkSecurity
detection
mechanisms,
and
intelligent
countermeasurestrategies[2].
Datasecurityistheaspectofsecuritythatallowsa
clients data to be transformed into unintelligible
data for transmission. Even if this unintelligible

dataisintercepted,akeyisneededtodecodethe HISTORYOFNETWORKSECURITY
message. This method of security is effective to a
certaindegree.Strongcryptographyinthepastcan Recentinterestinsecuritywasfueledbythecrime
be easily broken today. Cryptographic methods committed by Kevin Mitnick. Kevin Mitnick
have to continue to advance due to the committed the largest computerrelated crime in
U.S. history [3]. The losses were eighty million
advancementofthehackersaswell.
dollarsinU.S.intellectualpropertyandsourcecode
When transferring ciphertext over a network, it is from a variety of companies [3]. Since then,
helpfultohaveasecurenetwork.Thiswillallowfor informationsecuritycameintothespotlight.
the ciphertext to be protected, so that it is less
likely for many people to even attempt to break Public networks are being relied upon to deliver
the code. A secure network will also prevent financial and personal information. Due to the
someone from inserting unauthorized messages evolution of information that is made available
into the network. Therefore, hard ciphers are through the internet, information security is also
requiredtoevolve.DuetoKevinMitnicksoffense,
neededaswellasattackhardnetworks[2].
companies are emphasizing security for the
intellectual property. Internet has been a driving

forcefordatasecurityimprovement.
Internetprotocolsinthepastwerenotdeveloped
to secure themselves. Within the TCP/IP
communication stack, security protocols are not
implemented. This leaves the internet open to
attacks. Modern developments in the internet
architecture have made communication more
secure.
Figure1:BasedontheOSImodel,datasecurityandnetwork
securityhaveadifferentsecurityfunction[2].
1. BriefHistoryofInternet
2. SecurityTimeline
Several key events contributed to the birth and

evolution of computer and network security. The
timelinecanbestartedasfarbackasthe1930s.
Polish cryptographers created an enigma machine

in 1918 that converted plain messages to
encrypted text. In 1930, Alan Turing, a brilliant
mathematician broke the code for the Enigma.
Securing communications was essential in World
WarII.
In the 1960s, the term hacker is coined by a

couple of Massachusetts Institute of Technology
(MIT)students.TheDepartmentofDefensebegan
the ARPANet, which gains popularity as a conduit
for the electronic exchange of data and
information[3].Thispavesthewayforthecreation
ofthecarriernetworkknowntodayastheInternet.
During the 1970s, the Telnet protocol was
developed.Thisopenedthedoorforpublicuseof
data networks that were originally restricted to
governmentcontractorsandacademicresearchers
[3].
During the 1980s, the hackers and crimes relating

to computers were beginning to emerge. The 414
gang are raided by authorities after a nineday
cracking spree where they break into topsecret
systems. The Computer Fraud and Abuse Act of
1986wascreatedbecauseofIanMurphyscrimeof
stealing information from military computers. A
graduatestudent,RobertMorris,wasconvictedfor
unleashing the Morris Worm to over 6,000
vulnerable computers connected to the Internet.
Based on concerns that the Morris Worm ordeal
could be replicated, the Computer Emergency
Response Team (CERT) was created to alert
computerusersofnetworksecurityissues.
In the 1990s, Internet became public and the

security concerns increased tremendously.
Approximately950millionpeopleusetheinternet
today worldwide [3]. On any day, there are
approximately 225 major incidences of a security
The birth of the interne takes place in 1969 when

Advanced Research Projects Agency Network
(ARPANet) is commissioned by the department of
defense(DOD)forresearchinnetworking.
TheARPANETisasuccessfromtheverybeginning.
Although originally designed to allow scientists to
share data and access remote computers, email
quicklybecomesthemostpopularapplication.The
ARPANETbecomesahighspeeddigitalpost office
aspeopleuseittocollaborateonresearchprojects
and discuss topics of various interests. The
InterNetworking Working Group becomes the first
of several standardssetting entities to govern the
growing network [10]. Vinton Cerf is elected the
first chairman of the INWG, and later becomes
knownasa"FatheroftheInternet."[10]
In the 1980s, Bob Kahn and Vinton Cerf are key

members of a team that create TCP/IP, the
common language of all Internet computers. For
the first time the loose collection of networks
which made up the ARPANET is seen as an
"Internet",andtheInternetasweknowittodayis
born. The mid80s marks a boom in the personal
computer and superminicomputer industries. The
combinationofinexpensivedesktopmachinesand
powerful, networkready servers allows many
companies to join the Internet for the first time.
Corporations begin to use the Internet to
communicate with each other and with their
customers.
In the 1990s, the internet began to become

available to the public. The World Wide Web was
born. Netscape and Microsoft were both
competing on developing a browser for the
internet. Internet continues to grow and surfing
the internet has become equivalent to TV viewing
formanyusers.
breach [3]. These security breaches could also

result in monetary losses of a large degree.
Investment in proper security should be a priority
forlargeorganizationsaswellascommonusers.
The security architecture of the internet protocol,

known as IP Security, is a standardization of
internetsecurity.IPsecurity,IPsec,coversthenew
generation of IP (IPv6) as well as the current
version (IPv4). Although new techniques, such as
IPsec,havebeendevelopedtoovercomeinternets
bestknown deficiencies, they seem to be
insufficient [5]. Figure 2 shows a visual
representation of how IPsec is implemented to
providesecurecommunications.
IPSec is a pointtopoint protocol, one side

encrypts, the other decrypts and both sides share
key or keys. IPSec can be used in two modes,
namelytransportmodeandtunnelmodes.
INTERNETARCHITECTUREAND
VULNERABLESECURITYASPECTS
FearofsecuritybreachesontheInternetiscausing
organizationstouseprotectedprivatenetworksor
intranets [4]. The Internet Engineering Task Force
(IETF) has introduced security mechanisms at
various layers of the Internet Protocol Suite [4].
These security mechanisms allow for the logical
protectionofdataunitsthataretransferredacross
thenetwork.
Figure2:IPseccontainsagatewayandatunnelinordertosecurecommunications.[17]
The current version and new version of the

Internet Protocol are analyzed to determine the
security implications. Although security may exist
within the protocol, certain attacks cannot be
guarded against. These attacks are analyzed to
determineothersecuritymechanismsthatmaybe
necessary.
1. IPv4andIPv6Architectures
IPv4 was design in 1980 to replace the NCP

protocolontheARPANET.TheIPv4displayedmany
limitationsaftertwodecades[6].TheIPv6protocol
was designed with IPv4s shortcomings in mind.
IPv6isnotasupersetoftheIPv4protocol;instead
itisanewdesign.
configuration hassles for the user but not the

networksadministrators.
The lack of embedded security within the IPv4

protocol has led to the many attacks seen today.
MechanismstosecureIPv4doexist,butthereare
norequirementsfortheiruse[6].IPsecisaspecific
mechanism used to secure the protocol. IPsec
secures the packet payloads by means of
cryptography. IPsec provides the services of
confidentiality, integrity, and authentication [6].
This form of protection does not account for the
skilled hacker who may be able to break the
encryptionmethodandobtainthekey.
When internet was created, the quality of service

(QoS) was standardized according to the
information that was transferred across the
network. The original transfer of information was
mostly textbased. As the internet expanded and
technologyevolved,otherformsofcommunication
began to be transmitted across the internet. The
quality of service for streaming videos and music
are much different than the standard text. The
protocol does not have the functionality of
dynamic QoS that changes based on the type of
databeingcommunicated[6].
Theinternetprotocolsdesignissovastandcannot
becoveredfully.Themainpartsofthearchitecture
relatingtosecurityarediscussedindetail.
1.1IPv4Architecture
The protocol contains a couple aspects which

caused problems with its use. These problems do
not all relate to security. They are mentioned to
gain a comprehensive understanding of the
internetprotocolanditsshortcomings.Thecauses
ofproblemswiththeprotocolare:
1. AddressSpace
2. Routing
3. Configuration
4. Security
5. QualityofService
TheIPv4architecturehasanaddressthatis32bits
wide [6]. This limits the maximum number of
computers that can be connected to the internet.
The32bitaddressprovidesforamaximumoftwo
billionscomputerstobeconnectedtotheinternet.
The problem of exceeding that number was not
foreseenwhentheprotocolwascreated.Thesmall
addressspaceoftheIPv4facilitatesmaliciouscode
distribution[5].
Routingisaproblemforthisprotocolbecausethe
routingtablesareconstantlyincreasinginsize.The
maximum theoretical size of the global routing
tables was 2.1 million entries [6]. Methods have
been adopted to reduce the number of entries in
theroutingtable.Thisishelpfulforashortperiod
of time, but drastic change needs to be made to
addressthisproblem.
TheTCP/IPbasednetworkingofIPv4requiresthat
theusersuppliessomedatainordertoconfigurea
network. Some of the information required is the
IPaddress,routinggatewayaddress,subnetmask,
and DNS server. The simplicity of configuring the
network is not evident in the IPv4 protocol. The
user can request appropriate network
configuration from a central server [6]. This eases
1.2IPv6Architecture
When IPv6 was being developed, emphasis was

placedonaspectsoftheIPv4protocolthatneeded
to be improved. The development efforts were
placedinthefollowingareas:
1. Routingandaddressing
2. Multiprotocolarchitecture
3. Securityarchitecture
4. Trafficcontrol
TheIPv6protocolsaddressspacewasextendedby
supporting 128 bit addresses. With 128 bit
addresses, the protocol can support up to
3.4 10 ^38machines.Theaddressbitsareused
lessefficientlyinthisprotocolbecauseitsimplifies
addressingconfiguration.
6
Table1:AttackMethodsandSecurityTechnology[8]
The IPv6 routing system is more efficient and

enables smaller global routing tables. The host
configuration is also simplified. Hosts can
automatically configure themselves. This new
designallowseaseofconfigurationfortheuseras
wellasnetworkadministrator.
ThesecurityarchitectureoftheIPv6protocolisof
great interest. IPsec is embedded within the IPv6
protocol. IPsec functionality is the same for IPv4
andIPv6.TheonlydifferenceisthatIPv6canutilize
thesecuritymechanismalongtheentireroute[6].
ThequalityofserviceproblemishandledwithIPv6.
Theinternetprotocolallowsforspecialhandlingof
certainpacketswithahigherqualityofservice.
From a highlevel view, the major benefits of IPv6

are its scalability and increased security. IPv6 also
offers other interesting features that are beyond
thescopeofthispaper.
It must be emphasized that after researching IPv6

anditssecurityfeatures,itisnotnecessarilymore
secure than IPv4. The approach to security is only
slightlybetter,notaradicalimprovement.
Common attack methods and the security

technology will be briefly discussed. Not all of the
methods in the table above are discussed. The
current technology for dealing with attacks is
understood in order to comprehend the current
research developments in security hardware and
software.
2.1 CommonInternetAttackMethods
2. AttacksthroughtheCurrentInternet
ProtocolIPv4
Common internet attacks methods are broken

down into categories. Some attacks gain system
knowledge or personal information, such as
eavesdropping and phishing. Attacks can also
interferewiththesystemsintendedfunction,such
as viruses, worms and trojans. The other form of
attack is when the systems resources are
consumesuselessly,thesecanbecausedbydenial
of service (DoS) attack. Other forms of network
intrusions also exist, such as land attacks, smurf
attacks, and teardrop attacks. These attacks are
not as well known as DoS attacks, but they are
used in some form or another even if they arent
mentionedbyname.
There are four main computer security attributes.

Theywerementionedbeforeinaslightlydifferent
form, but are restated for convenience and
emphasis. These security attributes are
confidentiality,integrity,privacy,andavailability.
Confidentiality and integrity still hold to the same

definition. Availability means the computer assets
canbeaccessedbyauthorizedpeople[8].Privacyis
the right to protect personal secrets [8]. Various
attack methods relate to these four security
attributes. Table 1 shows the attack methods and
solutions.
personaldata,suchascreditcardnumbers,online
banking credentials, and other sensitive
Interception of communications by an information.
unauthorizedpartyiscalledeavesdropping.Passive
eavesdropping is when the person only secretly 2.1.6 IPSpoofingAttacks
listens to the networked messages. On the other
hand, active eavesdropping is when the intruder Spoofing means to have the address of the
listens and inserts something into the computermirrortheaddressofatrustedcomputer
communication stream. This can lead to the in order to gain access to other computers. The
messages being distorted. Sensitive information identity of the intruder is hidden by different
canbestolenthisway[8].
means making detection and prevention difficult.
With the current IP protocol technology, IP

2.1.2 Viruses
spoofedpacketscannotbeeliminated[8].
2.1.1 Eavesdropping
Viruses are selfreplication programs that use files

toinfectandpropagate[8].Onceafileisopened,
theviruswillactivatewithinthesystem.
2.1.7 DenialofService
2.2.1 Cryptographicsystems
Denial of Service is an attack when the system

receiving too many requests cannot return
communication with the requestors [9]. The
system then consumes resources waiting for the
2.1.3 Worms
handshake to complete. Eventually, the system
Awormissimilartoavirusbecausetheybothare cannot respond to any more requests rendering it

selfreplicating, but the worm does not require a withoutservice.
filetoallowittopropagate[8].Therearetwomain
2.2 TechnologyforInternetSecurity
typesofworms,massmailingwormsandnetwork
awareworms.Massmailingwormsuseemailasa
means to infect other computers. Networkaware Internetthreatswillcontinuetobeamajorissuein
worms are a major problem for the Internet. A the global world as long as information is
networkawarewormselectsatargetandoncethe accessible and transferred across the Internet.
worm accesses the target host, it can infect it by Differentdefenseanddetectionmechanismswere
developedtodealwiththeseattacks.
meansofaTrojanorotherwise.
2.1.4 Trojans
Trojansappeartobebenignprogramstotheuser,
but will actually have some malicious purpose.
Trojansusuallycarrysomepayloadsuchasavirus
[8].
Cryptography is a useful and widely used tool in

security engineering today. It involved the use of
codes and ciphers to transform information into
unintelligibledata.
2.2.2 Firewall
2.1.5 Phishing
Phishing is an attempt to obtain confidential Afirewallisatypicalbordercontrolmechanismor

information from an individual, group, or perimeter defense. The purpose of a firewall is to
organization[9].Phisherstrickusersintodisclosing block traffic from the outside, but it could also be
8
used to block traffic from the inside. A firewall is

the front line defense mechanism against
intruders. It is a system designed to prevent
unauthorizedaccesstoorfromaprivatenetwork.
Firewalls can be implemented in both hardware
andsoftware,oracombinationofboth[8].
areas of the IPv6 protocol still pose a potential

securityissue.
Thenewinternetprotocoldoesnotprotectagainst
misconfigured
servers,
poorly
designed
applications,orpoorlyprotectedsites.
Thepossiblesecurityproblemsemergeduetothe
following[5]:
1. Headermanipulationissues
2. Floodingissues
3. Mobilityissues
HeadermanipulationissuesariseduetotheIPsecs
embedded functionality [7]. Extension headers
detersomecommonsourcesofattacksbecauseof
header manipulation. The problem is that
extension headers need to be processed by all
stacks, and this can lead to a long chain of
extension headers. The large number of extension
headers can overwhelm a certain node and is a
formofattackifitisdeliberate.Spoofingcontinues
tobeasecuritythreatonIPv6protocol.
Atypeofattackcalledportscanningoccurswhena
whole section of a network is scanned to find
potential targets with open services [5]. The
addressspaceoftheIPv6protocolislargebutthe
protocol is still not invulnerable to this type of
attack.
Mobilityisanewfeaturethatisincorporatedinto
the internet protocol IPv6. The feature requires
special security measures. Network administrators
need to be aware of these security needs when
usingIPv6smobilityfeature.
2.2.3 IntrusionDetectionSystems
AnIntrusionDetectionSystem(IDS)isanadditional
protection measure that helps ward off computer
intrusions. IDS systems can be software and
hardware devices used to detect an attack. IDS
products are used to monitor connection in
determining whether attacks are been launched.
Some IDS systems just monitor and alert of an
attack,whereasotherstrytoblocktheattack.
2.2.4 AntiMalwareSoftwareandscanners
Viruses,wormsandTrojanhorsesareallexamples
ofmalicioussoftware,orMalwareforshort.Special
socalled antiMalware tools are used to detect
themandcureaninfectedsystem.
2.2.5 SecureSocketLayer(SSL)
TheSecureSocketLayer(SSL)isasuiteofprotocols
that is a standard way to achieve a good level of
securitybetweenawebbrowserandawebsite.SSL
is designed to create a secure channel, or tunnel,
between a web browser and the web server, so
thatanyinformationexchangedisprotectedwithin
thesecuredtunnel.SSLprovidesauthenticationof
clients to server through the use of certificates.
Clientspresentacertificatetotheservertoprove
theiridentity.
SECURITYINDIFFERENTNETWORKS
3. SecurityIssuesofIPProtocolIPv6
Thebusinessestodayusecombinationsoffirewalls,
encryption, and authentication mechanisms to
create intranets that are connected to the
internetbutprotectedfromitatthesametime.
Fromasecuritypointofview,IPv6isaconsiderable
advancement over the IPv4 internet protocol.
Despite the IPv6s great security mechanisms, it
still continues to be vulnerable to threats. Some
9
Intranet is a private computer network that uses

internet protocols. Intranets differ from
"Extranets" in that the former are generally
restricted to employees of the organization while
extranetscangenerallybeaccessedbycustomers,
suppliers,orotherapprovedparties.
There does not necessarily have to be any access

from the organization's internal network to the
Internet itself. When such access is provided it is
usually through a gateway with a firewall, along
with user authentication, encryption of messages,
and often makes use of virtual private networks
(VPNs).
Although intranets can be set up quickly to share

data in a controlled environment, that data is still
at risk unless there is tight security. The
disadvantageofaclosedintranetisthatvitaldata
mightnotgetintothehandsofthosewhoneedit.
Intranets have a place within agencies. But for
broader data sharing, it might be better to keep
thenetworksopen,withthesesafeguards:
1. Firewalls that detect and report intrusion

attempts
2. Sophisticatedviruscheckingatthefirewall
3. Enforced rules for employee opening of e
mailattachments
4. Encryption for all connections and data
transfers
5. Authentication by synchronized, timed
passwordsorsecuritycertificates
Itwasmentionedthatiftheintranetwantedaccess
to the internet, virtual private networks are often
used.Intranetsthatexistacrossmultiplelocations
generallyrunoverseparateleasedlinesoranewer
approach of VPN can be utilized. VPN is a private
network that uses a public network (usually the
Internet)toconnectremotesitesoruserstogether.
Insteadofusingadedicated,realworldconnection
such as leased line, a VPN uses "virtual"
connections routed through the Internet from the
company's private network to the remote site or
employee.Figure3isagraphicalrepresentationof
anorganizationandVPNnetwork.
Figure3:AtypicalVPNmighthaveamainLANatthecorporate
headquartersofacompany,otherLANsatremoteofficesor
facilitiesandindividualusersconnectingfromoutinthefield.[14]
CURRENTDEVELOPMENTSINNETWORK
SECURITY
The network security field is continuing down the

same route. The same methodologies are being
used with the addition of biometric identification.
Biometrics provides a better method of
authentication than passwords. This might greatly
reducetheunauthorizedaccessofsecuresystems.
Newtechnologysuchasthesmartcardissurfacing
in research on network security. The software
aspect of network security is very dynamic.
Constantly new firewalls and encryption schemes
arebeingimplemented.
The research being performed assists in

understandingcurrentdevelopmentandprojecting
thefuturedevelopmentsofthefield.
1. HardwareDevelopments
Hardware developments are not developing

rapidly.Biometricsystemsandsmartcardsarethe
only new hardware technologies that are widely
impactingsecurity.
10
The most obvious use of biometrics for network

security is for secure workstation logons for a
workstation connected to a network. Each
workstation requires some software support for
biometric identification of the user as well as,
depending on the biometric being used, some
hardware device. The cost of hardware devices is
one thing that may lead to the widespread use of
voice biometric security identification, especially
among companies and organizations on a low
budget. Hardware device such as computer mice
withbuiltinthumbprintreaderswouldbethenext
stepup.Thesedeviceswouldbemoreexpensiveto
implementonseveralcomputers,aseachmachine
would require its own hardware device. A
biometricmouse,withthesoftwaretosupportit,is
available from around $120 in the U.S. The
advantage of voice recognition software is that it
can be centralized, thus reducing the cost of
implementationpermachine.Attopoftherangea
centralizedvoicebiometricpackagecancostupto
$50,000butmaybeabletomanagethesecurelog
inofupto5000machines.
ThemainuseofBiometricnetworksecuritywillbe
to replace the current password system.
Maintainingpasswordsecuritycanbeamajortask
for even a small organization. Passwords have to
be changed every few months and people forget
their password or lock themselves out of the
system by incorrectly entering their password
repeatedly.Veryoftenpeoplewritetheirpassword
down and keep it near their computer. This is of
course completely undermines any effort at
network security. Biometrics can replace this
security identification method. The use of
biometric identification stops this problem and
while it may be expensive to set up at first, these
devicessaveonadministrationanduserassistance
costs.
Smart cards are usually a creditcardsized digital

electronic media. The card itself is designed to
store encryption keys and other information used
in authentication and other identification
processes. The main idea behind smart cards is to
provideundeniableproofofausersidentity.Smart
cardscanbeusedforeverythingfromlogginginto
the network to providing secure Web
communicationsandsecureemailtransactions.
It may seem that smart cards are nothing more

thanarepositoryforstoringpasswords.Obviously,
someone can easily steal a smart card from
someone else. Fortunately, there are safety
featuresbuiltintosmartcardstopreventsomeone
from using a stolen card. Smart cards require
anyone who is using them to enter a personal
identification number (PIN) before theyll be
granted any level of access into the system. The
PINissimilartothePINusedbyATMmachines.
When a user inserts the smart card into the card

reader,thesmartcardpromptstheuserforaPIN.
This PIN was assigned to the user by the
administratoratthetimetheadministratorissued
thecardtotheuser.BecausethePINisshortand
purely numeric, the user should have no trouble
rememberingitandthereforewouldbeunlikelyto
writethePINdown.
Buttheinterestingthingiswhathappenswhenthe
userinputsthePIN.ThePINisverifiedfrominside
the smart card. Because the PIN is never
transmittedacrossthenetwork,theresabsolutely
no danger of it being intercepted. The main
benefit, though, is that the PIN is useless without
the smart card, and the smart card is useless
withoutthePIN.
There are other security issues of the smart card.

The smart card is costeffective but not as secure
asthebiometricidentificationdevices.
2. SoftwareDevelopments
The software aspect of network security is very

vast. It includes firewalls, antivirus, vpn, intrusion
detection, and much more. The research
developmentofallsecuritysoftwareisnotfeasible
tostudyatthispoint.Thegoalistoobtainaview
11
ofwherethesecuritysoftwareisheadingbasedon
emphasisbeingplacednow.
The improvement of the standard security

softwarestillremainsthesame.Whennewviruses
emerge, the antivirus is updated to be able to
guard against those threats. This process is the
sameforfirewallsandintrusiondetectionsystems.
Many research papers that have been skimmed
werebasedonanalyzingattackpatternsinorderto
createsmartersecuritysoftware.
Asthesecurityhardwaretransitionstobiometrics,
the software also needs to be able to use the
information appropriately. Current research is
beingperformedonsecuritysoftwareusingneural
networks. The objective of the research is to use
neuralnetworksforthefacialrecognitionsoftware.
Manysmallandcomplexdevicescanbeconnected
to the internet. Most of the current security
algorithmsarecomputationalintensiveandrequire
substantial processing power. This power,
however, is not available in small devices like
sensors. Therefore, there is a need for designing
lightweight security algorithms. Research in this
areaiscurrentlybeingperformed.
FUTURETRENDSINSECURITY
What is going to drive the Internet security is the

set of applications more than anything else. The
futurewillpossiblybethatthesecurityissimilarto
an immune system. The immune system fights off
attacks and builds itself to fight tougher enemies.
Similarly, the network security will be able to
functionasanimmunesystem.
The trend towards biometrics could have taken

place a while ago, but it seems that it isnt being
activelypursued.Manysecuritydevelopmentsthat
aretakingplacearewithinthesamesetofsecurity
technology that is being used today with some
minoradjustments.
CONCLUSION
Network security is an important field that is

increasingly gaining attention as the internet
expands.Thesecuritythreatsandinternetprotocol
wereanalyzedtodeterminethenecessarysecurity
technology. The security technology is mostly
software based, but many common hardware
devices are used. The current development in
networksecurityisnotveryimpressive.
Originallyitwasassumedthatwiththeimportance
of the network security field, new approaches to
security, both hardware and software, would be
activelyresearched.Itwasasurprisetoseemostof
the development taking place in the same
technologies being currently used. The embedded
security of the new internet protocol IPv6 may
provide many benefits to internet users. Although
some security issues were observed, the IPv6
internet protocol seems to evade many of the
currentpopularattacks.CombineduseofIPv6and
securitytoolssuchasfirewalls,intrusiondetection,
andauthenticationmechanismswillproveeffective
in guarding intellectual property for the near
future. The network security field may have to
evolvemorerapidlytodealwiththethreatsfurther
inthefuture.
REFERENCES
[1]Dowd,P.W.;McHenry,J.T.,"Networksecurity:it's
timetotakeitseriously,"Computer,vol.31,no.9,pp.24
28,Sep1998
[2]Kartalopoulos,S.V.,"DifferentiatingDataSecurity
andNetworkSecurity,"Communications,2008.ICC'08.
IEEEInternationalConferenceon,pp.14691473,1923
May2008
[3]SecurityOverview,
www.redhat.com/docs/manuals/enterprise/RHEL4
Manual/securityguide/chsgsov.html.
[4]Molva,R.,InstitutEurecom,InternetSecurity
Architecture,inComputerNetworks&ISDNSystems
Journal,vol.31,pp.787804,April1999
12

[5]Sotillo,S.,EastCarolinaUniversity,IPv6security
issues,August2006,
www.infosecwriters.com/text_resources/pdf/IPv6_SSot
illo.pdf.
[6]AndressJ.,IPv6:thenextinternetprotocol,April
2005,www.usenix.com/publications/login/2005
04/pdfs/andress0504.pdf.
[7]WarfieldM.,SecurityImplicationsofIPv6,Internet
SecuritySystemsWhitePaper,
documents.iss.net/whitepapers/IPv6.pdf
[8]Adeyinka,O.,"InternetAttackMethodsandInternet
SecurityTechnology,"Modeling&Simulation,2008.
AICMS08.SecondAsiaInternationalConferenceon,
vol.,no.,pp.7782,1315May2008
[9]Marin,G.A.,"Networksecuritybasics,"Security&
Privacy,IEEE,vol.3,no.6,pp.6872,Nov.Dec.2005
[10]InternetHistoryTimeline,
www3.baylor.edu/~Sharon_P_Johnson/etg/inthistory.h
tm.
[11]Landwehr,C.E.;Goldschlag,D.M.,"Securityissues
innetworkswithInternetaccess,"Proceedingsofthe
IEEE,vol.85,no.12,pp.20342051,Dec1997
[12]"Intranet."Wikipedia,TheFreeEncyclopedia.23
Jun2008,10:43UTC.WikimediaFoundation,Inc.2Jul
2008
<http://en.wikipedia.org/w/index.php?title=Intranet&ol
did=221174244>.
[13]"Virtualprivatenetwork."Wikipedia,TheFree
Encyclopedia.30Jun2008,19:32UTC.Wikimedia
Foundation,Inc.2Jul2008
<http://en.wikipedia.org/w/index.php?title=Virtual_priv
ate_network&oldid=222715612>.
[14]Tyson,J.,HowVirtualprivatenetworkswork,
http://www.howstuffworks.com/vpn.htm.
[15]AlSalqan,Y.Y.,"FuturetrendsinInternetsecurity,"
DistributedComputingSystems,1997.,Proceedingsof
theSixthIEEEComputerSocietyWorkshoponFuture
Trendsof,vol.,no.,pp.216217,2931Oct1997
[16]Curtin,M.IntroductiontoNetworkSecurity,
http://www.interhack.net/pubs/networksecurity.
[17]ImprovingSecurity,
http://www.cert.org/tech_tips,2006.
[18]Serpanos,D.N.;Voyiatzis,A.G.,"Securenetwork
design:Alayeredapproach,"AutonomousDecentralized
System,2002.The2ndInternationalWorkshopon,vol.,
no.,pp.95100,67Nov.2002
[19]Ohta,T.;Chikaraishi,T.,"Networksecuritymodel,"
Networks,1993.InternationalConferenceon
InformationEngineering'93.'Communicationsand
NetworksfortheYear2000',ProceedingsofIEEE
SingaporeInternationalConferenceon,vol.2,no.,
pp.507511vol.2,611Sep1993
13
Introduction to Web Technologies

Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
The Internet is a network of networks

The Internet is the descendant of ARPANET (Advanced
Research Projects Agency Network) developed for the US DoD

The initial goal was to research the possibility of remote
communication between machines
Critical step was development of the TCP/IP protocol (1977)
TCP Transmission Control Protocol
IP Internet Protocol
Vinton Cerfs postcard analogy for TCP/IP:
Tara Murphy
A document is broken up into postcard-sized chunks (packets)

Each postcard has its own address and sequence number
Each postcard travels independently to the final destination
The document is reconstructed by ordering the postcards
If one is missing, the recipient can request for it to be resent
If a post-office is closed the postcard is sent a different way
Congestion and service interruptions do not stop transmission
Astroinformatics School: Web Technologies
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
The first connection between two hosts
Image Ref: http://www.computerhistory.org
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
The Internet grew extremely rapidly!

450
400
Number of hosts (millions)
350
300
250
200
150
100
50
Au
g-
Au 81
g
Au -82
g
Au -83
g
Au -84
g
Au -85
g
Au -86
g
Au -87
g
Au -88
g
Au -89
g
Au -90
g
Au -91
g
Au -92
g
Au -93
g
Au -94
g
Au -95
g
Au -96
g
Au -97
g
Au -98
g
Au -99
g
Au -00
g
Au -01
g
Au -02
g
Au -03
g
Au -04
g05
Date
Data Ref: http://www.isc.org/

Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
The World Wide Web operates over the Internet

We often use the phrases the WWW and the Internet
interchangeably, however they are different entities

The WWW is a service that operates over the internet
The concept of the WWW combines 4 ideas:
hypertext
resource identifiers (URI, URL)
client-server model of computing (web servers/browsers)
markup language (HTML)
These were the brainchild of Tim Berners-Lee from CERN
who released his first browser in 1991

All clients and servers in the WWW speak the language of
HTTP (HyperText Transfer Protocol)
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
We can generate content dynamically

There are several benefits to dynamically generating content:
We dont have to store loads of pages
The content is completely up-to-date
We can respond to/interact with the user
Every site that involves a transaction (eg. Google, Amazon,
NED) is generating dynamic content
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
Web servers serve content on request across the network

The web server is responsible for:
accepting requests for content described by the url
checking whether access is permitted and requesting
authentication if necessary
sending (or serving) the content back to the browser
A web server is the machine and the process serving content
The most popular web server software now is:
Apache is an open source web server (Unix/Mac OS X/Win)
Microsoft IIS is the main Windows web server (Win only)
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
Browsers and servers communicate via http

GET /index.html HTTP/1.0
Web Browser
Web Server
HTTP/1.1 200 OK
HyperText Transfer Protocol (http) is the standard protocol
for transferring web content

The server listens on port 80 waiting for connections
The web browser connects to the server, and sends a request
The server responds with an error code or the web content
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
The server runs a program to generate the web content

This program gets run every time the given url is requested
The server passes the http request details to the program
The program returns the web content or an error code
Each web server interacts with the programs differently:
Apache uses the Common Gateway Interface (cgi)
Microsoft IIS uses Active Server Pages (asp)
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
10
Browsing the web uses the client-server model

The client-server model involves networked interaction between:
a client in this case the web browser
a server in this case the web server
Dynamic content is generated on the server side
The advantages of server side are:
We are not running programs on low-powered client computers
Typically the data you want to present is on server side
The client will restrict program functionality for security
The disadvantage of server side are:
The server requires lots of processing power
particularly when there are many simultaneous clients
The client side is often quite powerful anyway
Lots of information may need to be passed back and forth
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
11
The CGI client-server interaction

HTML
Server
Client
CGI Script
Browser
Query
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
12
A web service is an application accessible over the Internet

Web services emerged amidst a lot of hype
A web service is a network accessible interface to application
functionality, built using standard internet technologies.

Powerful new way to build software systems from distributed
components
In other words, if an application can be accessed over a
network using protocols such as HTTP, XML, SMTP etc.

then it is a web service.
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
13
Web services use the client-server model

Recall the CGI client-server model
In the case of a user looking at a webpage
the client is the web browser
the server is the web server (and programs running on it)
On the WWW information is always returned to the client in
the form of a webpage (HTML).

The key to web services is that they return information in a
programmatic form (ie: they can return a string, float, array,

object, just like an function).
In the final stage of a chain of web services, the information
may be presented to the user e.g. a webpage may be

generated.
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
14
The Web service-client interaction

Result
HTML
Client/
Server
Server
Web Service
Browser
CGI
Query
Tara Murphy
Client
Query
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
15
The Web service-client interaction

Result
HTML
Client/
Server
Server
Web Service
Client
Browser
CGI
Query
Query
Result
Query
Server
Web Service
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
16
Example web services

Stock price quotes
Amazon web services
provides access to the entire Amazon database of books/prices
you could aggregate prices for multiple online bookshops
Google web services
originally just access to Google search engine results
people used to do this manually anyway screen scraping
now extended to other services, e.g. Google maps
And lots of astronomy/VO applications
Andreas will show some examples this afternoon
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
17
The HyperText Markup Language

HTML marks up the structure of a document for publishing
on the WWW
It tells the browser how to interpret and display the document
Different browsers interpret things differently (!)
There are two main standards: HTML 4(5) and XHTML 1.0
These are developed by W3C
W3C the World Wide Web Consortium

All HTML documents should declare which standard they are
using
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
18
Hello world!
1
2
3
4
5
6
7
8
9
10
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"

"http://www.w3.org/TR/html4/strict.dtd">
<HTML>
<HEAD>
<TITLE>My first HTML document</TITLE>
</HEAD>
<BODY>
<P>Hello world!
</BODY>
</HTML>
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
19
Hello world!
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
20
The basic unit of HTML is the element

HTML includes element types to represent paragraphs,
hypertext links, lists, tables, images, etc

Each element consists of three parts
1
2
3
start tag e.g. <title>

content e.g. my homepage
end tag e.g. </title>
A tag is an element name enclosed in angle brackets

Some elements have no content e.g. <br> or <hr>
Elements may have associated properties (attributes)
Attributes and their values appear inside the start tag
e.g. <div id="section1">
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
21
You only need a small set of elements to create a website

Element: start/end tags
<html> </html>
<title> </title>
<head> </head>
<body> </body>
<p> </p>
<hr />
<br />
<a href="url"> </a>
<img src="url" />

Tara Murphy
Description
Starts and ends a HTML document
Text that appears in the title bar
Information about the document
The main part of the document
A paragraph
A horizontal line
A line break
A link
An image
Comments that are not displayed
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
22
You only need a small set of elements to create a website

Element: start/end tags
<div> </div>
<span> </span>
<ul> </ul>
<ol> </ol>
<li> </li>
<table> </table>
<tr> </tr>
<td> </td>
<pre> </pre>
Tara Murphy
Description
A section in the document
An inline section in a document
An unordered list (bullet points)
An ordered list
A list item
Encloses a table
A row in a table
A cell within a row
Enclosed text that stays in its raw format
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
23
CSS was introduced into HTML 4.0 to solve a problem

We have focused on the structural aspects of HTML
In fact that is what HTML was originally designed for
<table> = This is a table
<p> = This is a paragraph
Layout was the job of the browser
As the WWW exploded, more people started writing
documents
The two major browsers (Internet Explorer and Netscape)
added new HTML tags and attributes to the original HTML

specification e.g. <font>
It became hard to separate structure and presentation
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
24
Formatting before CSS was inefficient

Before CSS all formatting had to be included as attributes in
HTML tags
1
2
3
<font face="Verdana, Arial" size="+1" color="blue">

Hello, World!
</font>
There are several disadvantages to this way of doing things

Information occurs in many locations redundancy errors
Updating multiple occurrences of information is
time-consuming
Formatting information is hard-coded in HTML document
HTML elements can describe format/presentation and
content/structure
Other formatting tags you might be familiar with a <b>
(bold), <i> (italics). . . we do not recommend using these
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
25
Hello World! the CSS version

To reproduce the previous HTML using CSS we need two files
1 A HTML page (e.g. mypage.html) containing this
1
2
3
4
5
6
7
2
1
2
3
4
5
Tara Murphy
<head>
<link href="css/mystyle.css" rel="stylesheet"
type="text/css" />
</head>
<body>
<p>Hello, World!</p>
</body>
An accompanying style sheet file (e.g. mystyle.css)

p {
color: blue;
font-size: small;
font-family: Verdana, Arial, sans-serif;
}
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
26
HTML and CSS should be validated

The W3C site provides tools for validating your website
they check what standard you claim to be using
then check all the syntax in your document complies with that
standard
The validators are free and easy to use, so there is no excuse!
http://validator.w3.org/
http://jigsaw.w3.org/css-validator/
Tara Murphy
17th February, 2011
The Internet
CGI
Web services
HTML and CSS
27
References
http://www.computerhistory.org
http://www.anu.edu.au/people/Roger.Clarke/II/
OzIHist.html
HTML: http://www.w3.org/MarkUp/
HTML: http://www.w3schools.com/html/
XHTML: http://www.w3.org/MarkUp/2004/xhtml-faq
XHTML:
http://www.w3schools.com/xhtml/xhtml html.asp
CSS: http://www.w3.org/Style/CSS/
CSS: http://www.csszengarden.com/
Tara Murphy
17th February, 2011
NetworkSecurity:History,Importance,andFuture
UniversityofFloridaDepartmentofElectricalandComputerEngineering
BhavyaDaya
ABSTRACT
ofintellectualpropertythatcanbeeasilyacquired
throughtheinternet.
There are currently two fundamentally different

networks,datanetworksandsynchronousnetwork
comprisedofswitches.Theinternetisconsidereda
data network. Since the current data network
consists of computerbased routers, information
can be obtained by special programs, such as
Trojan horses, planted in the routers. The
synchronous network that consists of switches
does not buffer data and therefore are not
threatened by attackers. That is why security is
emphasizedindatanetworks,suchastheinternet,
andothernetworksthatlinktotheinternet.
The vast topic of network security is analyzed by

researchingthefollowing:
1. Historyofsecurityinnetworks
2. Internet architecture and vulnerable
securityaspectsoftheInternet
3. Types of internet attacks and security
methods
4. Securityfornetworkswithinternetaccess
5. Current development in network security
hardwareandsoftware
Based on this research, the future of network

security is forecasted. New trends that are
emerging will also be considered to understand
wherenetworksecurityisheading.
Network security has become more important to

personal computer users, organizations, and the
military. With the advent of the internet, security
becameamajorconcernandthehistoryofsecurity
allowsabetterunderstandingoftheemergenceof
security technology. The internet structure itself
allowed for many security threats to occur. The
architecture of the internet, when modified can
reducethepossibleattacksthatcanbesentacross
the network. Knowing the attack methods, allows
for the appropriate security to emerge. Many
businessessecurethemselvesfromtheinternetby
means of firewalls and encryption mechanisms.
The businesses create an intranet to remain
connected to the internet but secured from
possiblethreats.
Theentirefieldofnetworksecurityisvastandinan
evolutionary stage. The range of study
encompasses a brief history dating back to
internetsbeginningsandthecurrentdevelopment
in network security. In order to understand the
research being performed today, background
knowledgeoftheinternet,itsvulnerabilities,attack
methods through the internet, and security
technology is important and therefore they are
reviewed.
INTRODUCTION
The world is becoming more interconnected with

the advent of the Internet and new networking
technology. There is a large amount of personal,
commercial,military,andgovernmentinformation
onnetworkinginfrastructuresworldwide.Network
security is becoming of great importance because
1. NetworkSecurity
Systemandnetworktechnologyisakeytechnology
forawidevarietyofapplications.Securityiscrucial
4. Integrity Ensure the message has not

beenmodifiedintransit
5. NonrepudiationEnsuretheuserdoesnot
refutethatheusedthenetwork
to networks and applications. Although, network

security is a critical requirement in emerging
networks, there is a significant lack of security
methodsthatcanbeeasilyimplemented.
There exists a communication gap between the

developers of security technology and developers
of networks. Network design is a welldeveloped
process that is based on the Open Systems
Interface (OSI) model. The OSI model has several
advantages when designing networks. It offers
modularity,
flexibility,
easeofuse,
and
standardization of protocols. The protocols of
different layers can be easily combined to create
stacks which allow modular development. The
implementationofindividuallayerscanbechanged
later without making other adjustments, allowing
flexibility in development. In contrast to network
design, secure network design is not a well
developed process. There isnt a methodology to
manage the complexity of security requirements.
Secure network design does not contain the same
advantagesasnetworkdesign.
An effective network security plan is developed

withtheunderstandingofsecurityissues,potential
attackers,neededlevelofsecurity,andfactorsthat
makeanetworkvulnerabletoattack[1].Thesteps
involved in understanding the composition of a
secure network, internet or otherwise, is followed
throughoutthisresearchendeavor.
To lessen the vulnerability of the computer to the

networktherearemanyproductsavailable.These
tools are encryption, authentication mechanisms,
intrusiondetection, security management and
firewalls. Businesses throughout the world are
using a combination of some of these tools.
Intranetsarebothconnectedtotheinternetand
reasonably protected from it. The internet
architecture itself leads to vulnerabilities in the
network. Understanding the security issues of the
internet greatly assists in developing new security
technologies and approaches for networks with
internetaccessandinternetsecurityitself.
The types of attacks through the internet need to

also be studied to be able to detect and guard
against them. Intrusion detection systems are
established based on the types of attacks most
commonly used. Network intrusions consist of
packetsthatareintroducedtocauseproblemsfor
thefollowingreasons:
Toconsumeresourcesuselessly
To interfere with any system resources
intendedfunction
To gain system knowledge that can be
exploitedinlaterattacks
The last reason for a network intrusion is most

commonlyguardedagainstandconsideredbymost
as the only intrusion motive. The other reasons
mentionedneedtobethwartedaswell.
When considering network security, it must be

emphasized that the whole network is secure.
Network security does not only concern the
security in the computers at each end of the
communicationchain.Whentransmittingdatathe
communication channel should not be vulnerable
to attack. A possible hacker could target the
communicationchannel,obtainthedata,decryptit
andreinsertafalsemessage.Securingthenetwork
isjustasimportantassecuringthecomputersand
encryptingthemessage.
When developing a secure network, the following

needtobeconsidered[1]:
1. Accessauthorizedusersareprovidedthe
means to communicate to and from a
particularnetwork
2. ConfidentialityInformationinthenetwork
remainsprivate
3. Authentication Ensure the users of the
networkarewhotheysaytheyare
2
Typical security currently exists on the computers

connected to the network. Security protocols
sometimesusually appearas partof a single layer
oftheOSInetworkreferencemodel.Currentwork
is being performed in using a layered approach to
secure network design. The layers of the security
model correspond to the OSI model layers. This
security approach leads to an effective and
efficient design which circumvents some of the
commonsecurityproblems.
The relationship of network security and data

security to the OSI model is shown in Figure 1. It
can be seen that the cryptography occurs at the
applicationlayer;thereforetheapplicationwriters
are aware of its existence. The user can possibly
choose different methods of data security.
Network security is mostly contained within the
physical layer. Layers above the physical layer are
also used to accomplish the network security
required [2]. Authentication is performed on a
layerabovethephysicallayer.Networksecurityin
2. DifferentiatingDataSecurityand
thephysicallayerrequiresfailuredetection,attack
NetworkSecurity
detection
mechanisms,
and
intelligent
countermeasurestrategies[2].
Datasecurityistheaspectofsecuritythatallowsa
clients data to be transformed into unintelligible
data for transmission. Even if this unintelligible

dataisintercepted,akeyisneededtodecodethe HISTORYOFNETWORKSECURITY
message. This method of security is effective to a
certaindegree.Strongcryptographyinthepastcan Recentinterestinsecuritywasfueledbythecrime
be easily broken today. Cryptographic methods committed by Kevin Mitnick. Kevin Mitnick
have to continue to advance due to the committed the largest computerrelated crime in
U.S. history [3]. The losses were eighty million
advancementofthehackersaswell.
dollarsinU.S.intellectualpropertyandsourcecode
When transferring ciphertext over a network, it is from a variety of companies [3]. Since then,
helpfultohaveasecurenetwork.Thiswillallowfor informationsecuritycameintothespotlight.
the ciphertext to be protected, so that it is less
likely for many people to even attempt to break Public networks are being relied upon to deliver
the code. A secure network will also prevent financial and personal information. Due to the
someone from inserting unauthorized messages evolution of information that is made available
into the network. Therefore, hard ciphers are through the internet, information security is also
requiredtoevolve.DuetoKevinMitnicksoffense,
neededaswellasattackhardnetworks[2].
companies are emphasizing security for the
intellectual property. Internet has been a driving

forcefordatasecurityimprovement.
Internetprotocolsinthepastwerenotdeveloped
to secure themselves. Within the TCP/IP
communication stack, security protocols are not
implemented. This leaves the internet open to
attacks. Modern developments in the internet
architecture have made communication more
secure.
Figure1:BasedontheOSImodel,datasecurityandnetwork
securityhaveadifferentsecurityfunction[2].
1. BriefHistoryofInternet
2. SecurityTimeline
Several key events contributed to the birth and

evolution of computer and network security. The
timelinecanbestartedasfarbackasthe1930s.
Polish cryptographers created an enigma machine

in 1918 that converted plain messages to
encrypted text. In 1930, Alan Turing, a brilliant
mathematician broke the code for the Enigma.
Securing communications was essential in World
WarII.
In the 1960s, the term hacker is coined by a

couple of Massachusetts Institute of Technology
(MIT)students.TheDepartmentofDefensebegan
the ARPANet, which gains popularity as a conduit
for the electronic exchange of data and
information[3].Thispavesthewayforthecreation
ofthecarriernetworkknowntodayastheInternet.
During the 1970s, the Telnet protocol was
developed.Thisopenedthedoorforpublicuseof
data networks that were originally restricted to
governmentcontractorsandacademicresearchers
[3].
During the 1980s, the hackers and crimes relating

to computers were beginning to emerge. The 414
gang are raided by authorities after a nineday
cracking spree where they break into topsecret
systems. The Computer Fraud and Abuse Act of
1986wascreatedbecauseofIanMurphyscrimeof
stealing information from military computers. A
graduatestudent,RobertMorris,wasconvictedfor
unleashing the Morris Worm to over 6,000
vulnerable computers connected to the Internet.
Based on concerns that the Morris Worm ordeal
could be replicated, the Computer Emergency
Response Team (CERT) was created to alert
computerusersofnetworksecurityissues.
In the 1990s, Internet became public and the

security concerns increased tremendously.
Approximately950millionpeopleusetheinternet
today worldwide [3]. On any day, there are
approximately 225 major incidences of a security
The birth of the interne takes place in 1969 when

Advanced Research Projects Agency Network
(ARPANet) is commissioned by the department of
defense(DOD)forresearchinnetworking.
TheARPANETisasuccessfromtheverybeginning.
Although originally designed to allow scientists to
share data and access remote computers, email
quicklybecomesthemostpopularapplication.The
ARPANETbecomesahighspeeddigitalpost office
aspeopleuseittocollaborateonresearchprojects
and discuss topics of various interests. The
InterNetworking Working Group becomes the first
of several standardssetting entities to govern the
growing network [10]. Vinton Cerf is elected the
first chairman of the INWG, and later becomes
knownasa"FatheroftheInternet."[10]
In the 1980s, Bob Kahn and Vinton Cerf are key

members of a team that create TCP/IP, the
common language of all Internet computers. For
the first time the loose collection of networks
which made up the ARPANET is seen as an
"Internet",andtheInternetasweknowittodayis
born. The mid80s marks a boom in the personal
computer and superminicomputer industries. The
combinationofinexpensivedesktopmachinesand
powerful, networkready servers allows many
companies to join the Internet for the first time.
Corporations begin to use the Internet to
communicate with each other and with their
customers.
In the 1990s, the internet began to become

available to the public. The World Wide Web was
born. Netscape and Microsoft were both
competing on developing a browser for the
internet. Internet continues to grow and surfing
the internet has become equivalent to TV viewing
formanyusers.
breach [3]. These security breaches could also

result in monetary losses of a large degree.
Investment in proper security should be a priority
forlargeorganizationsaswellascommonusers.
The security architecture of the internet protocol,

known as IP Security, is a standardization of
internetsecurity.IPsecurity,IPsec,coversthenew
generation of IP (IPv6) as well as the current
version (IPv4). Although new techniques, such as
IPsec,havebeendevelopedtoovercomeinternets
bestknown deficiencies, they seem to be
insufficient [5]. Figure 2 shows a visual
representation of how IPsec is implemented to
providesecurecommunications.
IPSec is a pointtopoint protocol, one side

encrypts, the other decrypts and both sides share
key or keys. IPSec can be used in two modes,
namelytransportmodeandtunnelmodes.
INTERNETARCHITECTUREAND
VULNERABLESECURITYASPECTS
FearofsecuritybreachesontheInternetiscausing
organizationstouseprotectedprivatenetworksor
intranets [4]. The Internet Engineering Task Force
(IETF) has introduced security mechanisms at
various layers of the Internet Protocol Suite [4].
These security mechanisms allow for the logical
protectionofdataunitsthataretransferredacross
thenetwork.
Figure2:IPseccontainsagatewayandatunnelinordertosecurecommunications.[17]
The current version and new version of the

Internet Protocol are analyzed to determine the
security implications. Although security may exist
within the protocol, certain attacks cannot be
guarded against. These attacks are analyzed to
determineothersecuritymechanismsthatmaybe
necessary.
1. IPv4andIPv6Architectures
IPv4 was design in 1980 to replace the NCP

protocolontheARPANET.TheIPv4displayedmany
limitationsaftertwodecades[6].TheIPv6protocol
was designed with IPv4s shortcomings in mind.
IPv6isnotasupersetoftheIPv4protocol;instead
itisanewdesign.
configuration hassles for the user but not the

networksadministrators.
The lack of embedded security within the IPv4

protocol has led to the many attacks seen today.
MechanismstosecureIPv4doexist,butthereare
norequirementsfortheiruse[6].IPsecisaspecific
mechanism used to secure the protocol. IPsec
secures the packet payloads by means of
cryptography. IPsec provides the services of
confidentiality, integrity, and authentication [6].
This form of protection does not account for the
skilled hacker who may be able to break the
encryptionmethodandobtainthekey.
When internet was created, the quality of service

(QoS) was standardized according to the
information that was transferred across the
network. The original transfer of information was
mostly textbased. As the internet expanded and
technologyevolved,otherformsofcommunication
began to be transmitted across the internet. The
quality of service for streaming videos and music
are much different than the standard text. The
protocol does not have the functionality of
dynamic QoS that changes based on the type of
databeingcommunicated[6].
Theinternetprotocolsdesignissovastandcannot
becoveredfully.Themainpartsofthearchitecture
relatingtosecurityarediscussedindetail.
1.1IPv4Architecture
The protocol contains a couple aspects which

caused problems with its use. These problems do
not all relate to security. They are mentioned to
gain a comprehensive understanding of the
internetprotocolanditsshortcomings.Thecauses
ofproblemswiththeprotocolare:
1. AddressSpace
2. Routing
3. Configuration
4. Security
5. QualityofService
TheIPv4architecturehasanaddressthatis32bits
wide [6]. This limits the maximum number of
computers that can be connected to the internet.
The32bitaddressprovidesforamaximumoftwo
billionscomputerstobeconnectedtotheinternet.
The problem of exceeding that number was not
foreseenwhentheprotocolwascreated.Thesmall
addressspaceoftheIPv4facilitatesmaliciouscode
distribution[5].
Routingisaproblemforthisprotocolbecausethe
routingtablesareconstantlyincreasinginsize.The
maximum theoretical size of the global routing
tables was 2.1 million entries [6]. Methods have
been adopted to reduce the number of entries in
theroutingtable.Thisishelpfulforashortperiod
of time, but drastic change needs to be made to
addressthisproblem.
TheTCP/IPbasednetworkingofIPv4requiresthat
theusersuppliessomedatainordertoconfigurea
network. Some of the information required is the
IPaddress,routinggatewayaddress,subnetmask,
and DNS server. The simplicity of configuring the
network is not evident in the IPv4 protocol. The
user can request appropriate network
configuration from a central server [6]. This eases
1.2IPv6Architecture
When IPv6 was being developed, emphasis was

placedonaspectsoftheIPv4protocolthatneeded
to be improved. The development efforts were
placedinthefollowingareas:
1. Routingandaddressing
2. Multiprotocolarchitecture
3. Securityarchitecture
4. Trafficcontrol
TheIPv6protocolsaddressspacewasextendedby
supporting 128 bit addresses. With 128 bit
addresses, the protocol can support up to
3.4 10 ^38machines.Theaddressbitsareused
lessefficientlyinthisprotocolbecauseitsimplifies
addressingconfiguration.
6
Table1:AttackMethodsandSecurityTechnology[8]
The IPv6 routing system is more efficient and

enables smaller global routing tables. The host
configuration is also simplified. Hosts can
automatically configure themselves. This new
designallowseaseofconfigurationfortheuseras
wellasnetworkadministrator.
ThesecurityarchitectureoftheIPv6protocolisof
great interest. IPsec is embedded within the IPv6
protocol. IPsec functionality is the same for IPv4
andIPv6.TheonlydifferenceisthatIPv6canutilize
thesecuritymechanismalongtheentireroute[6].
ThequalityofserviceproblemishandledwithIPv6.
Theinternetprotocolallowsforspecialhandlingof
certainpacketswithahigherqualityofservice.
From a highlevel view, the major benefits of IPv6

are its scalability and increased security. IPv6 also
offers other interesting features that are beyond
thescopeofthispaper.
It must be emphasized that after researching IPv6

anditssecurityfeatures,itisnotnecessarilymore
secure than IPv4. The approach to security is only
slightlybetter,notaradicalimprovement.
Common attack methods and the security

technology will be briefly discussed. Not all of the
methods in the table above are discussed. The
current technology for dealing with attacks is
understood in order to comprehend the current
research developments in security hardware and
software.
2.1 CommonInternetAttackMethods
2. AttacksthroughtheCurrentInternet
ProtocolIPv4
Common internet attacks methods are broken

down into categories. Some attacks gain system
knowledge or personal information, such as
eavesdropping and phishing. Attacks can also
interferewiththesystemsintendedfunction,such
as viruses, worms and trojans. The other form of
attack is when the systems resources are
consumesuselessly,thesecanbecausedbydenial
of service (DoS) attack. Other forms of network
intrusions also exist, such as land attacks, smurf
attacks, and teardrop attacks. These attacks are
not as well known as DoS attacks, but they are
used in some form or another even if they arent
mentionedbyname.
There are four main computer security attributes.

Theywerementionedbeforeinaslightlydifferent
form, but are restated for convenience and
emphasis. These security attributes are
confidentiality,integrity,privacy,andavailability.
Confidentiality and integrity still hold to the same

definition. Availability means the computer assets
canbeaccessedbyauthorizedpeople[8].Privacyis
the right to protect personal secrets [8]. Various
attack methods relate to these four security
attributes. Table 1 shows the attack methods and
solutions.
personaldata,suchascreditcardnumbers,online
banking credentials, and other sensitive
Interception of communications by an information.
unauthorizedpartyiscalledeavesdropping.Passive
eavesdropping is when the person only secretly 2.1.6 IPSpoofingAttacks
listens to the networked messages. On the other
hand, active eavesdropping is when the intruder Spoofing means to have the address of the
listens and inserts something into the computermirrortheaddressofatrustedcomputer
communication stream. This can lead to the in order to gain access to other computers. The
messages being distorted. Sensitive information identity of the intruder is hidden by different
canbestolenthisway[8].
means making detection and prevention difficult.
With the current IP protocol technology, IP

2.1.2 Viruses
spoofedpacketscannotbeeliminated[8].
2.1.1 Eavesdropping
Viruses are selfreplication programs that use files

toinfectandpropagate[8].Onceafileisopened,
theviruswillactivatewithinthesystem.
2.1.7 DenialofService
2.2.1 Cryptographicsystems
Denial of Service is an attack when the system

receiving too many requests cannot return
communication with the requestors [9]. The
system then consumes resources waiting for the
2.1.3 Worms
handshake to complete. Eventually, the system
Awormissimilartoavirusbecausetheybothare cannot respond to any more requests rendering it

selfreplicating, but the worm does not require a withoutservice.
filetoallowittopropagate[8].Therearetwomain
2.2 TechnologyforInternetSecurity
typesofworms,massmailingwormsandnetwork
awareworms.Massmailingwormsuseemailasa
means to infect other computers. Networkaware Internetthreatswillcontinuetobeamajorissuein
worms are a major problem for the Internet. A the global world as long as information is
networkawarewormselectsatargetandoncethe accessible and transferred across the Internet.
worm accesses the target host, it can infect it by Differentdefenseanddetectionmechanismswere
developedtodealwiththeseattacks.
meansofaTrojanorotherwise.
2.1.4 Trojans
Trojansappeartobebenignprogramstotheuser,
but will actually have some malicious purpose.
Trojansusuallycarrysomepayloadsuchasavirus
[8].
Cryptography is a useful and widely used tool in

security engineering today. It involved the use of
codes and ciphers to transform information into
unintelligibledata.
2.2.2 Firewall
2.1.5 Phishing
Phishing is an attempt to obtain confidential Afirewallisatypicalbordercontrolmechanismor

information from an individual, group, or perimeter defense. The purpose of a firewall is to
organization[9].Phisherstrickusersintodisclosing block traffic from the outside, but it could also be
8
used to block traffic from the inside. A firewall is

the front line defense mechanism against
intruders. It is a system designed to prevent
unauthorizedaccesstoorfromaprivatenetwork.
Firewalls can be implemented in both hardware
andsoftware,oracombinationofboth[8].
areas of the IPv6 protocol still pose a potential

securityissue.
Thenewinternetprotocoldoesnotprotectagainst
misconfigured
servers,
poorly
designed
applications,orpoorlyprotectedsites.
Thepossiblesecurityproblemsemergeduetothe
following[5]:
1. Headermanipulationissues
2. Floodingissues
3. Mobilityissues
HeadermanipulationissuesariseduetotheIPsecs
embedded functionality [7]. Extension headers
detersomecommonsourcesofattacksbecauseof
header manipulation. The problem is that
extension headers need to be processed by all
stacks, and this can lead to a long chain of
extension headers. The large number of extension
headers can overwhelm a certain node and is a
formofattackifitisdeliberate.Spoofingcontinues
tobeasecuritythreatonIPv6protocol.
Atypeofattackcalledportscanningoccurswhena
whole section of a network is scanned to find
potential targets with open services [5]. The
addressspaceoftheIPv6protocolislargebutthe
protocol is still not invulnerable to this type of
attack.
Mobilityisanewfeaturethatisincorporatedinto
the internet protocol IPv6. The feature requires
special security measures. Network administrators
need to be aware of these security needs when
usingIPv6smobilityfeature.
2.2.3 IntrusionDetectionSystems
AnIntrusionDetectionSystem(IDS)isanadditional
protection measure that helps ward off computer
intrusions. IDS systems can be software and
hardware devices used to detect an attack. IDS
products are used to monitor connection in
determining whether attacks are been launched.
Some IDS systems just monitor and alert of an
attack,whereasotherstrytoblocktheattack.
2.2.4 AntiMalwareSoftwareandscanners
Viruses,wormsandTrojanhorsesareallexamples
ofmalicioussoftware,orMalwareforshort.Special
socalled antiMalware tools are used to detect
themandcureaninfectedsystem.
2.2.5 SecureSocketLayer(SSL)
TheSecureSocketLayer(SSL)isasuiteofprotocols
that is a standard way to achieve a good level of
securitybetweenawebbrowserandawebsite.SSL
is designed to create a secure channel, or tunnel,
between a web browser and the web server, so
thatanyinformationexchangedisprotectedwithin
thesecuredtunnel.SSLprovidesauthenticationof
clients to server through the use of certificates.
Clientspresentacertificatetotheservertoprove
theiridentity.
SECURITYINDIFFERENTNETWORKS
3. SecurityIssuesofIPProtocolIPv6
Thebusinessestodayusecombinationsoffirewalls,
encryption, and authentication mechanisms to
create intranets that are connected to the
internetbutprotectedfromitatthesametime.
Fromasecuritypointofview,IPv6isaconsiderable
advancement over the IPv4 internet protocol.
Despite the IPv6s great security mechanisms, it
still continues to be vulnerable to threats. Some
9
Intranet is a private computer network that uses

internet protocols. Intranets differ from
"Extranets" in that the former are generally
restricted to employees of the organization while
extranetscangenerallybeaccessedbycustomers,
suppliers,orotherapprovedparties.
There does not necessarily have to be any access

from the organization's internal network to the
Internet itself. When such access is provided it is
usually through a gateway with a firewall, along
with user authentication, encryption of messages,
and often makes use of virtual private networks
(VPNs).
Although intranets can be set up quickly to share

data in a controlled environment, that data is still
at risk unless there is tight security. The
disadvantageofaclosedintranetisthatvitaldata
mightnotgetintothehandsofthosewhoneedit.
Intranets have a place within agencies. But for
broader data sharing, it might be better to keep
thenetworksopen,withthesesafeguards:
1. Firewalls that detect and report intrusion

attempts
2. Sophisticatedviruscheckingatthefirewall
3. Enforced rules for employee opening of e
mailattachments
4. Encryption for all connections and data
transfers
5. Authentication by synchronized, timed
passwordsorsecuritycertificates
Itwasmentionedthatiftheintranetwantedaccess
to the internet, virtual private networks are often
used.Intranetsthatexistacrossmultiplelocations
generallyrunoverseparateleasedlinesoranewer
approach of VPN can be utilized. VPN is a private
network that uses a public network (usually the
Internet)toconnectremotesitesoruserstogether.
Insteadofusingadedicated,realworldconnection
such as leased line, a VPN uses "virtual"
connections routed through the Internet from the
company's private network to the remote site or
employee.Figure3isagraphicalrepresentationof
anorganizationandVPNnetwork.
Figure3:AtypicalVPNmighthaveamainLANatthecorporate
headquartersofacompany,otherLANsatremoteofficesor
facilitiesandindividualusersconnectingfromoutinthefield.[14]
CURRENTDEVELOPMENTSINNETWORK
SECURITY
The network security field is continuing down the

same route. The same methodologies are being
used with the addition of biometric identification.
Biometrics provides a better method of
authentication than passwords. This might greatly
reducetheunauthorizedaccessofsecuresystems.
Newtechnologysuchasthesmartcardissurfacing
in research on network security. The software
aspect of network security is very dynamic.
Constantly new firewalls and encryption schemes
arebeingimplemented.
The research being performed assists in

understandingcurrentdevelopmentandprojecting
thefuturedevelopmentsofthefield.
1. HardwareDevelopments
Hardware developments are not developing

rapidly.Biometricsystemsandsmartcardsarethe
only new hardware technologies that are widely
impactingsecurity.
10
The most obvious use of biometrics for network

security is for secure workstation logons for a
workstation connected to a network. Each
workstation requires some software support for
biometric identification of the user as well as,
depending on the biometric being used, some
hardware device. The cost of hardware devices is
one thing that may lead to the widespread use of
voice biometric security identification, especially
among companies and organizations on a low
budget. Hardware device such as computer mice
withbuiltinthumbprintreaderswouldbethenext
stepup.Thesedeviceswouldbemoreexpensiveto
implementonseveralcomputers,aseachmachine
would require its own hardware device. A
biometricmouse,withthesoftwaretosupportit,is
available from around $120 in the U.S. The
advantage of voice recognition software is that it
can be centralized, thus reducing the cost of
implementationpermachine.Attopoftherangea
centralizedvoicebiometricpackagecancostupto
$50,000butmaybeabletomanagethesecurelog
inofupto5000machines.
ThemainuseofBiometricnetworksecuritywillbe
to replace the current password system.
Maintainingpasswordsecuritycanbeamajortask
for even a small organization. Passwords have to
be changed every few months and people forget
their password or lock themselves out of the
system by incorrectly entering their password
repeatedly.Veryoftenpeoplewritetheirpassword
down and keep it near their computer. This is of
course completely undermines any effort at
network security. Biometrics can replace this
security identification method. The use of
biometric identification stops this problem and
while it may be expensive to set up at first, these
devicessaveonadministrationanduserassistance
costs.
Smart cards are usually a creditcardsized digital

electronic media. The card itself is designed to
store encryption keys and other information used
in authentication and other identification
processes. The main idea behind smart cards is to
provideundeniableproofofausersidentity.Smart
cardscanbeusedforeverythingfromlogginginto
the network to providing secure Web
communicationsandsecureemailtransactions.
It may seem that smart cards are nothing more

thanarepositoryforstoringpasswords.Obviously,
someone can easily steal a smart card from
someone else. Fortunately, there are safety
featuresbuiltintosmartcardstopreventsomeone
from using a stolen card. Smart cards require
anyone who is using them to enter a personal
identification number (PIN) before theyll be
granted any level of access into the system. The
PINissimilartothePINusedbyATMmachines.
When a user inserts the smart card into the card

reader,thesmartcardpromptstheuserforaPIN.
This PIN was assigned to the user by the
administratoratthetimetheadministratorissued
thecardtotheuser.BecausethePINisshortand
purely numeric, the user should have no trouble
rememberingitandthereforewouldbeunlikelyto
writethePINdown.
Buttheinterestingthingiswhathappenswhenthe
userinputsthePIN.ThePINisverifiedfrominside
the smart card. Because the PIN is never
transmittedacrossthenetwork,theresabsolutely
no danger of it being intercepted. The main
benefit, though, is that the PIN is useless without
the smart card, and the smart card is useless
withoutthePIN.
There are other security issues of the smart card.

The smart card is costeffective but not as secure
asthebiometricidentificationdevices.
2. SoftwareDevelopments
The software aspect of network security is very

vast. It includes firewalls, antivirus, vpn, intrusion
detection, and much more. The research
developmentofallsecuritysoftwareisnotfeasible
tostudyatthispoint.Thegoalistoobtainaview
11
ofwherethesecuritysoftwareisheadingbasedon
emphasisbeingplacednow.
The improvement of the standard security

softwarestillremainsthesame.Whennewviruses
emerge, the antivirus is updated to be able to
guard against those threats. This process is the
sameforfirewallsandintrusiondetectionsystems.
Many research papers that have been skimmed
werebasedonanalyzingattackpatternsinorderto
createsmartersecuritysoftware.
Asthesecurityhardwaretransitionstobiometrics,
the software also needs to be able to use the
information appropriately. Current research is
beingperformedonsecuritysoftwareusingneural
networks. The objective of the research is to use
neuralnetworksforthefacialrecognitionsoftware.
Manysmallandcomplexdevicescanbeconnected
to the internet. Most of the current security
algorithmsarecomputationalintensiveandrequire
substantial processing power. This power,
however, is not available in small devices like
sensors. Therefore, there is a need for designing
lightweight security algorithms. Research in this
areaiscurrentlybeingperformed.
FUTURETRENDSINSECURITY
What is going to drive the Internet security is the

set of applications more than anything else. The
futurewillpossiblybethatthesecurityissimilarto
an immune system. The immune system fights off
attacks and builds itself to fight tougher enemies.
Similarly, the network security will be able to
functionasanimmunesystem.
The trend towards biometrics could have taken

place a while ago, but it seems that it isnt being
activelypursued.Manysecuritydevelopmentsthat
aretakingplacearewithinthesamesetofsecurity
technology that is being used today with some
minoradjustments.
CONCLUSION
Network security is an important field that is

increasingly gaining attention as the internet
expands.Thesecuritythreatsandinternetprotocol
wereanalyzedtodeterminethenecessarysecurity
technology. The security technology is mostly
software based, but many common hardware
devices are used. The current development in
networksecurityisnotveryimpressive.
Originallyitwasassumedthatwiththeimportance
of the network security field, new approaches to
security, both hardware and software, would be
activelyresearched.Itwasasurprisetoseemostof
the development taking place in the same
technologies being currently used. The embedded
security of the new internet protocol IPv6 may
provide many benefits to internet users. Although
some security issues were observed, the IPv6
internet protocol seems to evade many of the
currentpopularattacks.CombineduseofIPv6and
securitytoolssuchasfirewalls,intrusiondetection,
andauthenticationmechanismswillproveeffective
in guarding intellectual property for the near
future. The network security field may have to
evolvemorerapidlytodealwiththethreatsfurther
inthefuture.
REFERENCES
[1]Dowd,P.W.;McHenry,J.T.,"Networksecurity:it's
timetotakeitseriously,"Computer,vol.31,no.9,pp.24
28,Sep1998
[2]Kartalopoulos,S.V.,"DifferentiatingDataSecurity
andNetworkSecurity,"Communications,2008.ICC'08.
IEEEInternationalConferenceon,pp.14691473,1923
May2008
[3]SecurityOverview,
www.redhat.com/docs/manuals/enterprise/RHEL4
Manual/securityguide/chsgsov.html.
[4]Molva,R.,InstitutEurecom,InternetSecurity
Architecture,inComputerNetworks&ISDNSystems
Journal,vol.31,pp.787804,April1999
12

[5]Sotillo,S.,EastCarolinaUniversity,IPv6security
issues,August2006,
www.infosecwriters.com/text_resources/pdf/IPv6_SSot
illo.pdf.
[6]AndressJ.,IPv6:thenextinternetprotocol,April
2005,www.usenix.com/publications/login/2005
04/pdfs/andress0504.pdf.
[7]WarfieldM.,SecurityImplicationsofIPv6,Internet
SecuritySystemsWhitePaper,
documents.iss.net/whitepapers/IPv6.pdf
[8]Adeyinka,O.,"InternetAttackMethodsandInternet
SecurityTechnology,"Modeling&Simulation,2008.
AICMS08.SecondAsiaInternationalConferenceon,
vol.,no.,pp.7782,1315May2008
[9]Marin,G.A.,"Networksecuritybasics,"Security&
Privacy,IEEE,vol.3,no.6,pp.6872,Nov.Dec.2005
[10]InternetHistoryTimeline,
www3.baylor.edu/~Sharon_P_Johnson/etg/inthistory.h
tm.
[11]Landwehr,C.E.;Goldschlag,D.M.,"Securityissues
innetworkswithInternetaccess,"Proceedingsofthe
IEEE,vol.85,no.12,pp.20342051,Dec1997
[12]"Intranet."Wikipedia,TheFreeEncyclopedia.23
Jun2008,10:43UTC.WikimediaFoundation,Inc.2Jul
2008
<http://en.wikipedia.org/w/index.php?title=Intranet&ol
did=221174244>.
[13]"Virtualprivatenetwork."Wikipedia,TheFree
Encyclopedia.30Jun2008,19:32UTC.Wikimedia
Foundation,Inc.2Jul2008
<http://en.wikipedia.org/w/index.php?title=Virtual_priv
ate_network&oldid=222715612>.
[14]Tyson,J.,HowVirtualprivatenetworkswork,
http://www.howstuffworks.com/vpn.htm.
[15]AlSalqan,Y.Y.,"FuturetrendsinInternetsecurity,"
DistributedComputingSystems,1997.,Proceedingsof
theSixthIEEEComputerSocietyWorkshoponFuture
Trendsof,vol.,no.,pp.216217,2931Oct1997
[16]Curtin,M.IntroductiontoNetworkSecurity,
http://www.interhack.net/pubs/networksecurity.
[17]ImprovingSecurity,
http://www.cert.org/tech_tips,2006.
[18]Serpanos,D.N.;Voyiatzis,A.G.,"Securenetwork
design:Alayeredapproach,"AutonomousDecentralized
System,2002.The2ndInternationalWorkshopon,vol.,
no.,pp.95100,67Nov.2002
[19]Ohta,T.;Chikaraishi,T.,"Networksecuritymodel,"
Networks,1993.InternationalConferenceon
InformationEngineering'93.'Communicationsand
NetworksfortheYear2000',ProceedingsofIEEE
SingaporeInternationalConferenceon,vol.2,no.,
pp.507511vol.2,611Sep1993
13
Live Leak - IBPS PO Prelims 2016 Model

Question Paper based on Predicted Pattern
Quantitative Aptitude
1. Simplify:
0.75 0.75 + 0.25 0.75 4 + 0.50 0.50
a. 1
b. 1.25
c. 1.5625
d. 1.5
e. None of these
2. What should come in place of the question mark (?) in the following question?
83% of 1700 + 42% of 2150 = (?)3 + 117
a. 13
b. 14
c. 16
d. 15
e. None of these
3. What will come in place of the question mark (?) in the following question?
146% of 250 + ? % of 550 = 805
a. 60
b. 70
c. 50
d. 75
e. None of these
4. What will come in place of question mark in the following question?
(216)4 (36)4 (6)5 = (6)?
a. 27
b. 81
c. 18
d. 9
e. None of these
1|Page
6573 (70% of 30) (0.2)2 = ?
a. 7825
b. 62.6
c. 1565
d. 12.52
e. None of these
? 26 65 = 50% of 2,210
a. 424
b. 478
c. 456
d. 442
e. None of these
7. What should come in place of the question mark (?) in the following question
(7921) 51 + 50% of 748 = (?)3
a. 16
b. 19
c. 15
d. 21
e. None of these
2 +21
8. Find the value of 2+1 2

a. 3/2
b. 2n/2
c. 3
d.
e. 2
9. What will come in place of question mark (?) in the following question?
(16 4)3 (4)5 (2 8)2 = (4)?
a. 64
b. 32
c. 16
d. 8
e. None of these
2|Page
10. What approximate value will come in place of the question mark (?) in the following questions? (You
are not expected to calculate the exact value.)
79.99% of 1599 16.01% of 1399 =?
a. 856
b. 976
c. 1056
d. 1256
e. 1176
11. What is the remainder when 587 is divided by 625?
a. 11
b. 120
c. 0
d. 125
e. None of these
12. Present age of Sudha and Neeta are in the ratio of 6 : 7 respectively. Five years ago their ages were in
the ratio of 5 : 6 respectively. What is the Sudha's present age?
a. 22
b. 41
c. 14
d. 32
e. None of these
13. The average of the sum of three consecutive even numbers and three consecutive odd numbers is 21.
If the highest even number is 16, what is the lowest odd number?
a. 5
b. 7
c. 9
d. 11
e. Data Incorrect
14. What should come in place of the question mark (?) in the following number series?
7
17
54
?
1098
6591
a. 204
b. 212
c. 223
d. 219
e. None of these
3|Page
15. What should come in place of question mark (?) in the following number series?
10
17
48
165 688 3275 ?
a. 27584
b. 25670
c. 21369
d. 20892
e. None of these
16. What will come in place of the question mark (?) in the following number series?
7
11
27
63
?
a. 96
b. 111
c. 99
d. 127
e. None of these
17. In the following number series only one number is wrong. Find out the wrong number.
19
68
102 129 145 154
a. 154
b. 129
c. 145
d. 102
e. None of these
18. Find the next term in the given series in each of the questions below.
336, 210, 120, ?,
24, 6,
0
a. 40
b. 50
c. 60
d. 70
e. None of these
19. A horse worth Rs. 15000 is sold by A to B at 10% profit. B sells the horse back to A at 5% loss. Then, in
the entire transaction.
a. A loses Rs. 825
b. A gains Rs. 825
c. A loses Rs. 425
d. A gains Rs. 425
e. None of these
4|Page
20. If a team of 4 persons is to be selected from 8 males and 8 females, then in how many ways can
selections be made to include at least 1 female.
a. 3500
b. 1875
c. 1750
d. Cannot be determined
e. None of these
Directions: Study the following graph carefully and answer the questions given below it.
Total number of candidates appeared and qualified from various cities in an exam
Apperead
Qualified
Number of Candidates
3500
3000
2500
2000
1500
1000
500
0
A
Cities
21. The average number of candidates qualified in the examination from cities C and D together are what
percent of the average number of candidates appeared in the examination from the same cities? (
Rounded off to two digits after decimal)
a. 58.62
b. 73.91
c. 62.58
d. 58.96
e. None of these
22. What is the respective ratio of the number of students appeared to the number of candidates
qualified in the exam from City C?
a. 12 : 7
b. 6 : 5
c. 13 : 9
5|Page
d. 9 : 13
e. None of these
23. What is the respective ratio of the number of candidates qualified in the examination from city A and
the number of candidates qualified in the examination from city B?
a. 8 : 3
b. 7 : 5
c. 7 : 3
d. 9 : 5
e. None of these
24. The number of candidates appeared in the exam from City D are approximately what percent of the
total number of candidates appeared for the exam from all the Cities together?
a. 12
b. 24
c. 29
d. 18
e. 8
25. What is the difference between the average number of candidates appeared in the exam from all
given cities and the average number of candidates qualified from all the given cities?
a. 950
b. 1100
c. 990
d. 1020
e. None of these
26. The profit earned when an article is sold for Rs. 800 is 20 times the loss incurred when it is sold for Rs.
275. At what price should the article be sold if it is desired to make a profit of 25%.
a. Rs. 325
b. Rs. 350
c. Rs. 375
d. Rs. 400
e. Rs. 425
27. Aisha and Palak can do a piece of work in 25 and 20 days respectively. They began the work together
but Palak left after some days and Aisha completed the rest of the work in 16 days. After how many
days did Palak leave?
a. 7 days
b. 4 days
c. 9 days
d. 12 days
6|Page
e. None of these
28. The length of a room is 1.5 times its breadth. The cost of carpeting it at Rs. 150 per sq. meter is Rs.
14400 and the cost of white-washing the four walls at Rs. 5 per sq. meter is Rs. 625. Find the length
of the room.
a. 16 m
b. 12 m
c. 14 m
d. 10 m
e. None of these
29. A lion sees a deer. It estimate that the deer is 40 leaps away. The deer sees the lion and starts running,
with the lion in hot pursuit. If in every minute, the lion makes 6 leaps and the deer makes 8 leaps and
one leap of the lion is equal to 2 leaps of the deer. Find the time in which the deer is caught by the
lion (assume an open field with no trees)
a. 12 minutes
b. 15 minutes
c. 12.5 minutes
d. 20 minutes
e. None of these
30. A shopkeeper sold 12 cameras at a profit of 20% and 8 cameras at a profit of 10%. If he had sold all
the 20 cameras at a profit of 15%, then his profit would have been reduced by Rs. 36. What is the cost
price of each camera?
a. Rs 180
b. Rs 370
c. Rs 160
d. Rs 245
e. None of these
Directions: Study the pie chart carefully to answer the following questions:
7|Page
Percentage of students qualified for the first round in different

activities in Inter-school competition
(total qualified students = 3000)
Dancing,
24 %
Craft , 25 %
Drawing , 14 %
Singing,
21 %
Swimming,
16 %
Percentage break - up of girls qualified the first round of these

activities out of the total studnets participated
(number of girls who qualified for the first round = 1750)
Dancing,
20 %
Singing,
28 %
Craft ,
22 %
Drawing ,
16 %
Swimming,
14 %
31. What is the approximate percentage of boys qualified for the first round in the Inter-school
competition?
a. 34
8|Page
b.
c.
d.
e.
56
28
50
42
32. How many boys did qualify the first round in Singing and Craft together?
a. 505
b. 610
c. 885
d. 720
e. None of these
33. What is the total number of girls who qualify the first round in Swimming and Drawing together?
a. 480
b. 525
c. 505
d. 495
e. None of these
34. Number of girls who qualified for the first round in Dancing is what percent of total number of
students, qualified for the first round in the Inter-school competition?
a. 12.35
b. 14.12
c. 11.67
d. 10.08
e. None of these
35. What is the respective ratio of number of girls qualified the first round in Swimming to the number of
boys qualified the first round in Swimming?
a. 47 : 49
b. 23 : 29
c. 29 : 23
d. 49 : 47
e. None of these
Reasoning Ability
36-40. Directions: Study the following information carefully and answer the given questions.
In a certain code language ja hu fi means Box is empty, ka hu ni ma means Box full of chocolates, fi
ni na mi means He is eating chocolates, ka ba na fi ma means He is full of enthusiasm.
9|Page
36. Which of the following means he in that code language?

a. Hu
b. Ni
c. Na
d. Ja
e. mi
37. Code ka is for which word in the given language?
a. Full
b. Box
c. Of
d. either (a) or (c)
e. either (a) or (b)
38. What would be the code for Chocolate is full of energy, if code for chocolate and chocolates is same?
a. ka fi ma ni xa
b. xa ka ma ja ni
c. xa fi ba ka ma
d. fi ma hu xa mi
e. None of these
39. Code mi is for which word in the given language?
a. Is
b. He
c. Chocolates
d. Eating
e. full
40. He is eating yummy food is coded as mi du na fi bu. What is the code for yummy in the given code?
a. Bu
b. Du
c. Mi
d. either (a) or (b)
e. either (a) or (c)
41-43. Directions: These questions are based on the following information.
R is the brother of G.
Q is the sister of R.
O is the brother of N.
N is the daughter of G.
10 | P a g e
L is the father of Q.
41. Who is the uncle of O?
a. R
b. L
c. G
d. Q
e. N
42. Who is the father of N?
a. G
b. R
c. Q
d. N
e. Cannot be determined
43. How many male members are there in the family?
a. 2
b. 3
c. 4
d. 1
e. Cannot be determined
44-45. Directions: Study the following information carefully and answer the questions given below.
There are five students by name Pushkar, Ashu, Pranshu, Chetna and Sakshi each securing different
marks in a subject. Ashu secured more marks than only Pranshu. Sakshi and Chetna secured less marks
than only Pushkar.
44. Who among them secured third least marks?
a. Pushkar
b. Chetna
c. Ashu
e. None of these
45. If Pushkar secured 80 marks and Chetna secured 65 marks then what is the possibility of Sakshis
marks?
a. 78
b. 63
c. 60
11 | P a g e
e. None of these
46. Directions: In the question, relationship between different elements is shown in the statements.
The statements are followed by four conclusions. Give answer.
Statements:
H = P, Y M, M W, P < Y
Conclusions:
I. W > Y
II. M < P
III. M H
a. None is true
b. Only I is true
c. Only II is true
d. Only III is true
e. Only I and II is true
The statements are followed by two conclusions.
Statements:
E C = B; D = C M > N
Conclusions:
I) D = B
II) N E
III) N > E
a. Only (I)
b. Only (I) and either (II) or (III)
c. Both (I) and (III)
d. Both (I) and (II)
e. None of these
48. Directions: In this question, the relationship between different elements is shown in the
statements. These statements are followed by three conclusions. Mark your answer from the given
options.
Statements:
L M > K, Z = K < P
Conclusions:
I. Z L
II. M = P
III. Z < M
12 | P a g e
a.
b.
c.
d.
e.
Only I is true
Only II is true
Only III is true
Both I and III are true
None of these
The statements are followed by two conclusions.
Statements:
B J; K < L > M; J < K; G H > B
Conclusions:
I. L > J
II. J < M
a. Only conclusion I is true.
b. Only conclusion II is true.
c. Either conclusion I or II is true.
d. Neither conclusion I nor II is true.
e. Both the conclusion I and II are true.
50. Directions: In the question, relationship between different statements is shown in the statements.
These statements are followed by conclusions.
Statements:
P> U V >R; X < Y = R> Z
Conclusions:
I. Z > U
II. R < P
a. Only conclusion I follows.
b. Only conclusion II follows.
c. Either conclusion I or conclusion II follows.
d. Neither conclusion I nor conclusion II follows.
e. Both conclusion I and II follow.
51. Directions: In the following question, three statements are given followed by four conclusions I, II,
III and IV. You have to consider the given statements to be true even, if they seem to be at variance
with commonly known facts. Read all the conclusions and decide which of the following logically
follows from the given statements disregarding the commonly known facts.
Statements:
All forests are towns.
All towns are villages.
13 | P a g e
All cities are villages.

Conclusions:
I. All villages are not towns.
II. Some cities are definitely towns.
III. Some cities are forests.
IV. No cities is a forest.
a. Either III or IV follows
b. I and II follow
c. I and III follow
d. I and IV follow
e. All follow
52. Directions: In the question below are given three statements followed by four conclusions
numbered I, II, III and IV. You have to take the given statements to be true even if they seem to be
at variance with commonly known facts. Read all the conclusions and then decide which of the
given conclusions logically follows from the given statements disregarding commonly known facts.
Statements:
Some rats are deer.
No deer is a lion.
All emus are lions.
Conclusions:
I. No rat is lion.
II. No emu is deer.
III. Some emus are rats.
IV. Some deer are emu.
a. All follow
b. Only II follows
c. Either I or II follow
d. Only II and III follow
e. Either I or IV follow
53. Directions: In the question below there are three statements followed by four conclusions
numbered I, II, III and IV. You have to take the given statements to be true even if they seem to be
at variance from commonly known facts. Read all the conclusions and then decide which of the
given conclusion logically follows from the given statements, disregarding commonly known facts.
Statements:
Some dogs are rats.
All rats are trees.
Some trees are not dogs.
Conclusions:
14 | P a g e
I. Some trees are dogs.

II. All dogs are trees.
III. All rats are dogs.
IV. No tree is dog.
a. None follows
b. Only I follow
c. Only I and II follow
d. Only II and III follow
e. All follow
54. Directions: In the question below, two statements are given followed by two conclusions numbered
I and II. You have to take the two statements to be true even if they seem to be at variance from
the commonly known facts and then decide which of the given conclusions logically follows from
the given statements disregarding the commonly known facts.
Statements:
Some tricks are magic.
All magic are true.
Conclusions:
I. There is a possibility that all tricks are true.
II. There is a possibility that all magic are tricks.
a. Only conclusion I follows.
c. Either conclusion I or conclusion II follows.
d. Neither conclusion I nor conclusion II follows.
e. Both conclusions I and II follow.
55. Directions: The question consists of two statements followed by two conclusions numbered I and
II. You have to take the two given statements to be true even if they seem to be at variance from
commonly known facts and then decide which of the given conclusions logically follows from the
given statements disregarding commonly known facts.
Statements:
No dip is a fin.
All dips are fans.
Conclusions:
I. All fans are dips.
II. No fin is fan.
a. Only conclusion I follow.
c. Either I or II follows.
15 | P a g e
d. Neither I nor II follow.

e. Both I and II follow.
Eight members A, B, C, D, E, F, G and H of a family are sitting around a circular table with all of them
facing outwards. Each one of them has a different brand of mobile viz. Nokia, Apple, Asus, HTC,
Micromax, Xiaomi, Motorola and Xolo. There are exactly 3 married couples in the family.
1) D is the mother of A and E and sits second to the left of E.
2) A who is the father of F and uncle of G sits to the left of person owning Nokia.
3) H is the only sister-in-law of A whereas B owns Xiaomi and is daughter-in-law of C.
4) The one who owns an Apple sits between G and the owner of Micromax. G is third to the left of D.
5) F is an immediate neighbor of her aunt H who does not sit next to D.
6) The two youngest members sit next to each other.
7) The Xiaomi owner sits between Motorola and Xolo owner.
8) Ds husband and son sit next to her. C does not own Xolo or Motorola.
9) G does not have Asus or Motorola.
10) HTC is not owned by Gs father.
56. Which of the following is correct regarding the family?
a. A is the brother of H.
b. C is the father of A.
c. B is the aunt of F.
d. F and G are married couple.
e. None of these.
57. Who among the following is not a husband wife pair?
a. A and B
b. E and H
c. A and H
d. D and C
e. None of these
58. Who among the following owns Asus?
a. A
b. E
c. C
d. D
e. H
59. What is the position of HTC owner with respect to the Xiaomi owner?
16 | P a g e
a.
b.
c.
d.
e.
Third to the left

Second to the right
Immediate left
Third to the right
Fourth to the left
60. Who among the following sits between B and owner of Apple?
a. A
b. E
c. G
d. D
e. H
61-65. Directions: Study the following information carefully and answer the questions given below.
Ten persons A, B, C, D, E, F, G, H, J and K are sitting in two rows with five persons in each row. The
persons in row one are facing south and the persons in row two are facing north. Each person in row
one faces a person from the other row. All of them have a mobile of different companies, viz M1, M2,
M3, M4, M5, M6, M7, M8, M9 and M10, but not necessarily in the same order.
The persons who like M5 and M6 sit opposite each other. F sits opposite A, who likes M1. The one who
likes M2 sits opposite the one who likes M8. K is not facing north but sits third to the left of G, who likes
M2. There is only one person between B and C. E sits at one of the ends of the row and likes M6. The
one who likes M8 is on the immediate right of D, who does not like M10.
The persons who like M3 and M4 respectively are not facing north. C likes M7. The one who likes M4 sits
opposite the one who is second to the right of B. J does not like M10. E sits opposite the one who sits
second to the left of the one who likes M3.
61. Four of the following five are alike in a certain way and hence form a group. Which is the one that
does not belong to that group?
a. H, G
b. E, C
c. D, B
d. K, J
e. J, C
62. Which of the following statements is/are true?
a. H have M10 and sits at one of the ends of the row.
b. F is the immediate neighbour of G and the person who have M5.
c. The one who have M7 sits on the immediate left of the one who have M6.
d. Only a) and c) are true
17 | P a g e
e. None of the above

63. B likes which of the following Mobile?
a. M5
b. M8
c. M10
d. M4
64. How many persons sit between D and E?
a. One
b. Two
c. Three
d. Cant be determined
65. Who among the following sit at the extreme ends of the row and are facing south?
a. D, E
b. J, C
c. K, E
d. H, J
There are seven girls, G1, G2, G3, G4, G5, G6 and G7 who participated in a singing competition which
started on Monday and ended on Sunday. In the first round of the competition, each of them performed
regional songs, R1, R2, R3, R4, R5, R6 and R7, but not necessarily in the same order. They like different
colors, C1, C2, C3, C4, C5, C6 and C7, but not necessarily in the same order.
G2 did not perform on the day either immediately before or immediately after the performance of G4,
who does not like either C5 or C1 or C4 color. Two performances were held between the performance of
G7 and G6, neither of whom performed on the Monday. There was one performance between the
performance of G4 and G3. But G3s performance did not happen either on Monday or on Wednesday.
G3 likes C2 color and performed R1 song. The one, who performed R3 song on the last day of
competition, likes C3 color. G5 performed immediately after G3 and she likes C1 color. G1 does not like
C5 color and performed a R4 song. G4 did not perform either R7 or R6 song. The one who performed in
R5 was scheduled immediately after the performance of the R1 singer. G7, who likes C6, performed on
the fourth day of the competition but performed neither R1 nor R2 song.
66. Who among the following performed on the day before the performance of G2?
a. The person who performed R3 song
18 | P a g e
b.
c.
d.
e.
The person who likes C4 color

None
67. Which of the following combinations is definitely false regarding their schedule?
a. G3-C2-R1
b. G4-C7-R2
c. G1-C4-R4
d. G7-C6-R3
e. None of these
68. Who sings R6 song?
a. G7
b. G4
c. G2
d. Either G2 or G7
e. None of these
69. If G7 is related to C2, G5 is related to C3, which of the following is G4 related to?
a. C5
b. C6
c. C4
d. C7
e. None of these
70. Which of the following combinations is true?
a. G1s performance was held on the fifth day of the week.
b. G5 likes C6 color.
c. G4 sings R1 song.
d. G5s performance was scheduled before G3 but after G6.
e. None of these.
English Language
71-80. Directions: Read the passage given below and then answer the questions given below the
passage. Some words may be highlighted for your attention. Pay careful attention.
19 | P a g e
India is right now in the midst of an inflationary episode that has gone on for 17 months. It began in
December 2009, when the WPI inflation climbed to 7.15%, it continued to rise, peaked in April 2010, at
just short of 11%. Thereafter, it has been on a broadly downward trajectory. What has caused some
concern once again is that there was a small pick-up in inflation in December 2011 and also because the
downward trajectory has been disappointingly slow. Before this 17-month run, we had one year of
negligible inflation; but just prior to that there was another rally from March 2008 to December 2008,
when WPI inflation hovered in and around 10%. Before these two rallies in quick succession, India had
very little inflation for a dozen years. There were occasional months when inflation would exceed 8% and
not a single month when it was in double digits during these twelve years of relative price stability.
For reasons of completeness it may be mentioned that independent Indias highest inflation occurred in
September 1974, when inflation reached 33.3%. Arguably our worst inflationary episode was from
November 1973 to December 1974, when inflation never dropped below 20% and was above 30% for four
consecutive months starting June 1974.
What is good performance and what is bad depends on the yardstick. Even during the dozen years of price
stability we had more inflation than in virtually any industrialized country in recent times, but in
comparison to most emerging market economies and developing nations in the world, Indias
performance was creditable.
One reason for the concern with the past 17 months inflation-run is the fact that since the mid nineteen
nineties and all the way till 2006 we had price stability. This concern has led to the talk of runaway inflation
and hyperinflation. It is however important to get the perspective right. We are nowhere near
hyperinflationusually described as inflation over 50% per month (Cagan, 1956). The worlds biggest
inflations occurred in Europe, once around 1923 and again around 1946. The record is held by Hungary
from August 1945 to July 1946. During these twelve months, prices rose by 3.8 1027 . That is, what cost
1 pengo on August 1st, 1945, would cost 38000 (a total of 26 such zeroes) pengos on 31 July 1946. In
August 1946 the pengo was replaced with the forint in an effort to shed the trillions of zeroes that were
needed to express prices in pengos.
Comparable inflations have occurred in Russia from December 1921 to January 1924, in Greece in 1943,
in Zimbabwe in 2008, in Germany in 1923 and in many other instances. The German hyperinflation of
1923 may well be the most analyzed and diagnosed inflation. It played havoc with the economy, created
political tensions which contributed to the rise of Nazism, and also caused psychological disturbances.
Doctors in Germany in 1923 identified a mental illness called cipher stroke which many people were
afflicted with during the height of the hyperinflation. It referred to a neurotic urge to keep writing zeroes
and also to a propensity to meaninglessly add zeroes when responding to routine questions, such as to
say two trillion when asked how many children the person has (Ahamed, 2009).
Not quite as large as these European inflations but nevertheless staggeringly big ones occurred till two or
20 | P a g e
three decades ago in many Latin American countries (see Garcia, Guillen and Kehoe, 2010). These being
closer to our times and having an economy which is progressing gradually, may have greater relevance to
us. One country that has coped with mega inflations, many times larger than what we have in India, and
seems to have at last stabilized, and is now among the forerunners of well-run economies among
emerging market economies, is Brazil.
71. When did the current inflationary period start in India and for how long has it existed?
a. December 2010; 17 months
b. December 2009; 17 months
c. December 2010; 18 months
d. December 2009; 18 months
e. January 2009; 17 months
72. Choose the correct meaning of the word trajectory as used in the passage?
a. Station
b. Crossroad
c. Road
d. Course
e. Immobility
73. Which year did independent India achieve the highest inflation and at what rate?
a. September 1974; 33.6%
b. August 1984; 33.6%
c. August 1974; 33.3%
d. September 1984; 36.3%
e. September 1974; 33.3%
74. Choose the correct meaning of the word succession as used in the passage.
a. Accession
b. Promotion
c. Sequence
d. Elevation
e. Advancement
75. What was identified as a psychological disease due to inflation and in which year?
a. Cipher stroke in 1923
b. Schizophrenia in 1923
c. Oedipus complex in 1823
21 | P a g e
d. Dementia in 1900
e. Kleptomania in 1920
76. Choose the correct meaning of the word yardstick as used in the passage.
a. Paradigm
b. Hypothesis
c. Umbrage
d. Offensive
e. Standard
77. Why are the inflations in Latin American countries more relevant to the Indian economy than those
in Europe?
a. Latin American countries and India share similar economic conditions and the reason behind
inflation in Latin American countries are similar to that in India.
b. India and Latin American countries share similar economic patterns and Latin American
countries have faced the inflations in recent times closer to the time period when India faced
inflation.
c. Both India and Latin American countries are developing economies and the inflation periods
faced by them are closer in time than those faced by the developed European countries.
d. Inflations in Latin American countries are not at all relevant to those in India.
e. India being more developed economy wise than most Latin American countries do not have any
inflation issues.
78. Choose the correct meaning of the word afflicted as underlined in the paragraph in context to the
whole from the given options.
a. Troubled
b. Burdened
c. Patient
d. Abandoned
e. Enduring
79. Which one of the following statements is true in the context of the passage?
a. The inflation rate in India was around 10% in April 2010 and then started going down.
b. India is isolated from the world economy.
c. India was under a deflationary stage prior to 2010 but the rate has gone up since then.
d. The Indian economy is highly stabilized and hyperinflation is not present.
22 | P a g e
e. After April 2010, the Indian economy started facing hyperinflation as inflation rates continued to
increase.
80. What is the major theme discussed in the passage?
a. Various forms of inflation.
b. Comparison of inflation rates in India at present with that of the past.
c. Comparison of inflation rates in India with that of other emerging countries.
d. Concerns about inflation in the Indian economy.
e. Both 1 and 4.
81-85. Directions: Rearrange the following six sentences A, B. C, D, E and F in a proper sequence to
form a meaningful paragraph and then answer the questions given below.
A. The bank said in a statement that customers can log into its website and avail personal loans in a
minute.
B. This facility neither requires the customer to visit the branch nor does he have to put his manual
signature on any document.
C. Upon the acceptance of the offer by the customer, the loan money is credited to his account
immediately.
D. Moreover, the facility can be accessed round the clock.
E. The first phase of the launch will see this product being rolled out to our existing select customers
pan India.
F. Digital personal loans comes as third in the series of BYOM (Be Your Own Master) digital retail loans
the bank has rolled out, previous offerings being digital car loans and loan against term deposit.
81. Which of the following should be the THIRD sentence after rearrangement?
a. B
b. F
c. C
d. E
e. A
82. Which of the following should be the FIRST sentence after rearrangement?
a. F
b. B
c. C
d. A
e. E
83. Which of the following should be the FIFTH sentence after rearrangement?
a. A
23 | P a g e
b.
c.
d.
e.
B
C
F
E
84. Which of the following should be the FOURTH sentence after rearrangement?
a. A
b. B
c. C
d. D
e. E
85. Which of the following should be the SECOND sentence after rearrangement?
a. A
b. B
c. C
d. D
e. F
86-90. Directions: Below, a passage is given with five blanks labelled (A)-(E). Below the passage, five
options are given for each blank. Choose the word that fits each blank most appropriately in the
context of the passage, and mark the corresponding answer.
Gita Press, and the monthly magazine it published, Kalyan, were ___(A)___ in the mid-1920. Most
such ___(B)___ of that era are now___(C)___ ,except these two. As of early 2014, the press had sold 72
million copies of Tulsidas Ramcharitmanas and other works, and 94.8 million copies of ___(D)___ on
the ideal Hindu woman and child. As of today, Kalyan has a circulation of over 2, 00,000 and its
English ___(E)___ Kalyan-Kalpataru, over 1,00,000.
86. Which of the following words most appropriately fits the blank labelled (A)?
a. Founded
b. Invented
c. Generated
d. Foreseen
e. Consecrated
87. Which of the following words most appropriately fits the blank labelled (B)?
a. Investments
b. Experiments
c. Propositions
d. Ventures
24 | P a g e
e. Deals
88. Which of the following words most appropriately fits the blank labelled (C)?
a. Inanimate
b. Defunct
c. Departed
d. Exterminated
e. Existing
89. Which of the following words most appropriately fits the blank labelled (D)?
a. Leaflet
b. Thesauruses
c. Telegrams
d. Monographs
e. Musings
90. Which of the following words most appropriately fits the blank labelled (E)?
a. Contemporary
b. Replica
c. Transcript
d. Message
e. Counterpart
91-95. Read each sentence to find out whether there is any error in it. The error, if any, will be in one
part of the sentence. The number of that part is the answer. If there is no error, the answer is (e).
Ignore errors of punctuation, if any.
91. My sister-in-law (a)/ along with her daughter (b)/ were present (c)/ at the party. (d)/ No Error (e)
92. The teacher said that each students (a)/ would receive a failing grade (b)/ unless they (c)/ owned up
to the prank. (d)/ No Error (e)
93. The greatest love songs (a)/ are not written (b)/ from the brain (c)/ and from the heart. (d)/ No Error
(e)
94. English is (a)/ the toughest and the (b)/ easier language to (c)/ learn in the world. (d)/ No Error (e)
95. Audrey was the (a)/epitome of all things (b)/ beautiful and familiar, (c)/ because distant. (d)/ No
Error (e)
25 | P a g e
96-100. Each of the questions below has two blanks, each blank indicating that something has been
omitted. Choose the set of words for each blank which best fit the meaning of the sentence as a
whole.
96. The house was found in ______ and he was believed to be absconding with a ______ of valuable
jewellery.
a. woods, truck
b. shambles, load
c. unlocked state, load
d. disarray, haul
e. untouched, bundle
97. ______ and ______ should not be tolerated in our country which speaks of Ahimsa as its principle
of life.
a. Politicking, elections
b. Dishonour, efficiency
c. Lethargy, procrastination
d. Nepotism, selfishness
e. Hatred, violence
98. An efficient management will decide not only the _________ for equipment but also its ______ for
deciding priorities.
a. technology, methodology
b. cost, value
c. usefulness, utility
d. need, urgency
e. requirement, necessities
99. Now that the mammoth is extinct, the elephant is the _________ and the _______ of all animals
living.
a. Largest, Weakest
b. Largest, Lightest
c. Largest, Strongest
d. Largest, Smallest
e. Largest, Delicacy
100. There is no _____ to ______.
a. way, hell
b. shame, people
26 | P a g e
c. tragedy, stories
d. shortcut, success
e. number, venue
27 | P a g e

Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Merged

Uploaded by

Copyright:

Available Formats

1

1.1 DATABASE MANAGEMENT SYSTEM (DBMS)

4) Integrity problems:- Data values stored in the database must

1.2 DATA INDEPENDENCE

referring to the mapping information in the catalog. Data

1.3 DATA ABSTRACTION:

1.4 DATA MODELS

1.5 DBMS ARCHITECTURE

Fig: Three-Schema DBMS Architecture

1.6 PEOPLE WHO WORK WITH THE DATABASE:

4) Granting of authorization foe data access: DBA can decide

1.7 OVERVIEW OF CONVENTIONAL DATA MODELS:

Figure 4-2: A hierarchical bill-of-materials database

To access the data in the database, a program could:

To deal with applications such as order processing, a new

The basic object that the ER model represents is an entity,

Each entity has attributesthe particular properties that

For example, an employee entity may be described by the

A particular entity will have a value for each of its attributes.

The attribute values that describe each entity become a major

Several types of attributes occur in the ER model: simple versus

2.2.1 Composite versus Simple (Atomic) Attributes

Composite attributes can be divided into smaller subparts,

For example, the Address attribute of the employee entity can

Attributes that are not divisible are called simple or atomic

Composite attributes can form a hierarchy; for example, Name

The value of a composite attribute is the concatenation of the

2.2.2 Single-valued Versus Multi-valued Attributes

But phone number attribute may have multiple values. Such

2.2.3 Stored Versus Derived Attributes

Two or more attribute values are relatedfor example, the Age

For a particular person entity, the value of Age can be

The Age attribute is hence called a derived

The attribute from which another attribute value is derived is

In the above example, date of birth is the stored attribute.

Take another example, if we have to calculate the interest on

In this case, interest is the derived attribute whereas principal

An important constraint on the entities of an entity type is the

A key is an attribute (also known as column or field) or a

Sometimes we might have to retrieve data from more than one

The purpose of the key is to bind data together across tables

Such an attribute is called a key attribute, and its values can be

For example, the Name attribute is a key of the COMPANY

For the PERSON entity type, a typical key attribute is

Sometimes, several attributes together form a key, meaning that

If a set of attributes possesses this property, we can define a

The various types of key with e.g. in SQL are mentioned

E.g. of Super Key

E.g. of Foreign Key Let consider we have another table i.e.

There are several implicit relationships among the various entity

In fact, whenever an attribute of one entity type refers to another

For example, the attribute Manager of department refers to an

In the ER model, these references should not be represented as

Figure: E-R diagram corresponding to customers and loans.

Figure: Mapping cardinalities. (a) One to one. (b) One to many.

Figure; Mapping cardinalities. (a) Many to one. (b) Many to

The participation of an entity set E in a relationship set R is said

If only some entities in E participate in relationships in R, the

For example, we expect every loan entity to be related to at

Therefore the participation of loan in the relationship set

In contrast, an individual can be a bank customer whether or not

Hence, it is possible that only some of the customer entities are

2.7 WEAK ENTITIES