You are on page 1of 69

Assignment No.

1
Title of Assignment:
Implement a
system using
Inheritance in ORDBMS.

multivalued

Attributes

and

Relevant Theory / Literature Survey:

ORDBMS Definition
An object relational database is also called an object
relational database management system (ORDBMS). This system
simply puts an object oriented front end on a relational
database (RDBMS). When applications interface to this type
of database, it will normally interface as though the data
is stored as objects. However the system will convert the
object information into data tables with rows and columns
and handle the data the same as a relational database.
Likewise, when the data is retrieved, it must be reassembled
from simple data into complex objects.

About Oracle Objects and Object Types


Oracle object types are user-defined data types that make it
possible to model complex real-world entities such as
customers
and
purchase
orders
as
unitary
entities--"objects"--in the database.
Oracle object technology is a layer of abstraction built on
Oracle's relational technology. New object types can be
created from any built-in database types and any previously
created object types, object references, and collection
types. Metadata for user-defined types is stored in a schema
that is available to SQL, PL/SQL, Java, and other published
interfaces.
Object types and related object-oriented features such as
variable-length arrays and nested tables provide higherlevel ways to organize and access data in the database.
Underneath the object layer, data is still stored in columns
and tables, but you are able to work with the data in terms
of the real-world entities--customers and purchase orders,
for example--that make the data meaningful. Instead of
thinking in terms of columns and tables when you query the

database, you can simply select a customer.


Internally, statements about objects are still basically
statements about relational tables and columns, and you can
continue to work with relational data types and store data
in relational tables as before. But now you have the option
to take advantage of object-oriented features too. You can
begin to use object-oriented features while continuing to
work with most of your data relationally, or you can go over
to an object-oriented approach entirely. For instance, you
can define some object data types and store the objects in
columns in relational tables. You can also create object
views of existing relational data to represent and access
this data according to an object model. Or you can store
object data in object tables, where each row is an object.

Advantages of Objects
In general, the object-type model is similar to the class
mechanism found in C++ and Java. Like classes, objects make
it easier to model complex, real-world business entities and
logic, and the reusability of objects makes it possible to
develop database applications faster and more efficiently.
By natively supporting object types in the database, Oracle
enables application developers to directly access the data
structures used by their applications. No mapping layer is
required between client-side objects and the relational
database columns and tables that contain the data. Object
abstraction and the encapsulation of object behaviors also
make applications easier to understand and maintain.
Below are listed several other specific advantages that
objects offer over a purely relational approach.
Objects Can Encapsulate Operations Along with Data
Objects Are Efficient
Objects Can Represent Part-Whole Relationships

Basic Components of Oracle Objects


Object-Relational Elements
Object-relational functionality introduces a number of new
concepts and resources. These are briefly described in the

following sections.

Object Types
An object type is a kind of data type. You can use it in the
same ways that you use more familiar data types such as
NUMBER or VARCHAR2. For example, you can specify an object
type as the data type of a column in a relational table, and
you can declare variables of an object type. You use a
variable of an object type to contain a value of that object
type. A value of an object type is an instance of that type.
An object instance is also called an object.
Object types also have some important differences from the
more familiar data types that are native to a relational
database:

A set of object types does not come ready-made with the


database. Instead, you define the object types you
want.
Object types are not unitary: they have parts, called
attributes and methods.

You can think of an object type as a structural blueprint or


template and an object as an actual thing built according to
the template.

Type Inheritance
You can specialize an object type by creating subtypes that
have some added, differentiating feature, such as an
additional attribute or method. You create subtypes by
deriving them from a parent object type, which is called a
super type of the derived subtypes.
Subtypes and super types are related by inheritance: as
specialized versions of their parent, subtypes have all the
parent's attributes and methods plus any specializations
that are defined in the subtype itself. Subtypes and super
types connected by inheritance make up a type hierarchy.

Objects
When you create a variable of an object type, you create an
instance of the type: the result is an object. An object has

the attributes and methods defined for its type. Because an


object instance is a concrete thing, you can assign values
to its attributes and call its methods.

Design Analysis / Implementation Logic:


Implementation:
Object Tables
An object table is a special kind of table in which each row
represents an object.
For example, the following statements create a person object
type and define an object table for person objects:
CREATE TYPE person AS OBJECT (
name
VARCHAR2(30),
phone
VARCHAR2(20) );
CREATE TABLE person_table OF person;
You can view this table in two ways:

As a single-column table in which each row is a person


object,
allowing
you
to
perform
object-oriented
operations
As a multi-column table in which each attribute of the
object type person, namely name and phone, occupies a
column, allowing you to perform relational operations

For example, you can execute the following instructions:


INSERT INTO person_table VALUES (
"John Smith",
"1-800-555-1212" );
SELECT VALUE(p) FROM person_table p
WHERE p.name = "John Smith";
The first statement inserts a person object into person_table,
treating person_table as a multi-column table. The second selects

from person_table as a single-column table, using


function to return rows as object instances.

the

VALUE

Varrays
An array is an ordered set of data elements. All elements of a
given array are of the same data type. Each element has an
index, which is a number corresponding to the element's
position in the array.
The number of elements in an array is the size of the array.
Oracle allows arrays to be of variable size, which is why
they are called varrays. You must specify a maximum size when
you declare the array type.
For example, the following statement declares an array type:
CREATE TYPE prices AS VARRAY(10) OF NUMBER(12,2);
The VARRAYs of type PRICES have no more than ten elements,
each of datatype NUMBER(12,2).
Creating an array type does not allocate space. It defines a
datatype, which you can use as:

The datatype of a column of a relational table.


An object type attribute.
The type of a PL/SQL variable, parameter, or function
return value.

A varray is normally stored in line, that is, in the same


tablespace as the other data in its row. If it is
sufficiently large, Oracle stores it as a BLOB.
A varray cannot contain LOBs. This means that a varray also
cannot contain elements of a user-defined type that has a
LOB attribute.

Nested Tables
A nested table is an unordered set of data elements, all of the
same datatype. It has a single column, and the type of that

column is a built-in type or an object type. If the column


in a nested table is an object type, the table can also be
viewed as a multi-column table, with a column for each
attribute of the object type.
For example, in the purchase order example, the following
statement declares the table type used for the nested tables
of line items:
CREATE TYPE lineitem_table AS TABLE OF lineitem;
A table type definition does not allocate space. It defines
a type, which you can use as

The datatype of a column of a relational table.


An object type attribute.
A PL/SQL variable, parameter, or function return type.

When a column in a relational table is of nested table type,


Oracle stores the nested table data for all rows of the
relational table in the same storage table. Similarly, with
an object table of a type that has a nested table attribute,
Oracle stores nested table data for all object instances in
a single storage table associated with the object table.
For example, the following statement defines an object table
for the object type PURCHASE_ORDER:
CREATE TABLE purchase_order_table OF purchase_order
NESTED TABLE lineitems STORE AS lineitems_table;
The second line specifies LINEITEMS_TABLE as the storage table
for the LINEITEMS attributes of all of the PURCHASE_ORDER
objects in PURCHASE_ORDER_TABLE.
A convenient way to access the elements of a nested table
individually is to use a nested cursor.

Testing

The

Create the person_table using the object person

Nested table purchase_order table with puchase_order

object person is created with name & phone no

and lineitems is created.


Conclusion:
Multivalued

attributes

and

inheritance

in

ORDBMS

is

implemented.

Assignment No.2
Title of Assignment:
Implement K-Means Data Mining Clustering Algorithm.
Relevant Theory / Literature Survey: (Brief Theory Expected)
.

What is K-Means Clustering?


In simple words, it is an algorithm to classify or to group
your objects based on attributes/features into K number of
group. K is positive integer number. The grouping is done by
minimizing the sum of squares of distances between data and
the corresponding cluster centroid. Thus, the purpose of Kmean clustering is to classify the data.

Step by step k means clustering algorithm

Step 1. Begin with a decision on the value of k = number of


clusters
Step 2. Put any initial partition that classifies the data
into k clusters. You may assign the training samples
randomly, or systematically as the following:
1. Take the first k training sample as single-element
clusters
2. Assign each of the remaining (N-k) training sample to
the cluster with the nearest centroid. After each
assignment, recomputed the centroid of the gaining
cluster.
Step 3 . Take each sample in sequence and compute its
distance from the centroid of each of the clusters. If a
sample is not currently in the cluster with the closest
centroid, switch this sample to that cluster and update the
centroid of the cluster gaining the new sample and the
cluster losing the sample.
Step 4 . Repeat step 3 until convergence is achieved, that
is until a pass through the training sample causes no new
assignments.
If the number of data is less than the number of cluster
then we assign each data as the centroid of the cluster.

Each centroid will have a cluster number. If the number of


data is bigger than the number of cluster, for each data, we
calculate the distance to all centroid and get the minimum
distance. This data is said belong to the cluster that has
minimum distance from this data.

Applications of K-mean clustering


There are a lot of applications of the K-mean clustering,
range from unsupervised learning of neural network, Pattern
recognitions,
Classification
analysis,
Artificial
intelligent, image processing, machine vision, etc. In
principle, you have several objects and each object have
several attributes and you want to classify the objects
based on the attributes, then you can apply this algorithm.

Design Analysis / Implementation Logic:

Numerical Example of K-Means Clustering


The basic step of k-means clustering is simple. In the
beginning we determine number of cluster K and we assume the
centroid or center of these clusters. We can take any random
objects as the initial centroids or the first K objects in
sequence can also serve as the initial centroids.
Then the K means algorithm will do the three steps below
until convergence
Iterate until stable (= no object move group):
1. Determine the centroid coordinate
2. Determine the distance of each object to the centroids
3. Group the object based on minimum distance
Suppose we have several objects (4 types of medicines) and
each object have two attributes or features as shown in
table below. Our goal is to group these objects into K=2
group of medicine based on the two features (pH and weight
index).
Object
attribute
1
(X):attribute 2 (Y): pH
weight index
Medicine A
1
1
Medicine B
2
1
Medicine C
4
3
Medicine D
5
4

Each medicine represents one point with two attributes (X,


Y) that we can represent it as coordinate in an attribute
space as shown in the figure below.

1. Initial value of centroids : Suppose we use medicine A


and medicine B as the first centroids. Let
the coordinate of the centroids, then

and

denote

and

2. Objects-Centroids distance : we calculate the distance


between cluster centroid to each object. Let us use
Eculidian Distance, then we have distance matrix at
iteration 0 is

Each column in the distance matrix symbolizes the object.


The first row of the distance matrix corresponds to the
distance of each object to the first centroid and the second
row is the distance of each object to the second centroid.
For example, distance from medicine C = (4, 3) to the first
centroid

is

, and its distance to

the second centroid


is
, etc.
3. Objects clustering : We assign each object based on the
minimum distance. Thus, medicine A is assigned to group 1,
medicine B to group 2, medicine C to group 2 and medicine D
to group 2. The element of Group matrix below is 1 if and
only if the object is assigned to that group.

4. Iteration-1, determine centroids : Knowing the members of


each group, now we compute the new centroid of each group
based on these new memberships. Group 1 only has one member
thus the centroid remains in
. Group 2 now has three
members, thus the centroid is the average coordinate among
the three members:

5. Iteration-1, Objects-Centroids distances : The next step


is to compute the distance of all objects to the new
centroids. Similar to step 2, we have distance matrix at
iteration 1 is

6. Iteration-1, Objects clustering: Similar to step 3, we


assign each object based on the minimum distance. Based on
the new distance matrix, we move the medicine B to Group 1
while all the other objects remain. The Group matrix is
shown below

7. Iteration 2, determine centroids: Now we repeat step 4 to


calculate the new centroids coordinate based on the
clustering of previous iteration. Group1 and group 2 both
has
two
members,
thus
the
new
centroids
are
and

8. Iteration-2, Objects-Centroids distances : Repeat step 2


again, we have new distance matrix at iteration 2 as

9. Iteration-2, Objects clustering: Again, we assign each


object based on the minimum distance.

We obtain result that


. Comparing the grouping of last
iteration and this iteration reveals that the objects does
not move group anymore. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration
is needed. We get the final grouping as the results
Object
Feature 1 (X):Feature 2 (Y):Group (result)
weight index
pH
Medicine A
1
1
1
Medicine B
2
1
1
Medicine C
4
3
2
Medicine D
5
4
2

Testing:

When User Click picture box to input new data(X,Y)the


program will make group/cluster the data by minimizing the
sum of
squares of
distances between
data and
the
corresponding cluster centroid. Each dot is representing an
object and the coordinates (X, Y) represents the two
attributes of the object. The colours of the dot and label
number represents the cluster.

Conclusion:
Thus grouped all the user data (X,Y)into three clusters by
minimizing the sum of squares of distances between data and
the corresponding cluster centroid.

Assignment No.3
Title of Assignment:
Design
a
Database.

Web-based

application

using

ASP

involving

Relevant Theory / Literature Survey:

The need for ASP


Why bother with ASP at all, when HTML can serve your needs?
If you want to display information, all you have to do is
fire up your favorite text editor, type in a few HTML tags,
and save it as an HTML file.
But wait what if you want to display information that
changes? Supposing youre writing a page that provides
constantly changing information to your visitors, for
example, weather reports, stock quotes, a list of your
girlfriends, etc, HTML can no longer keep up with the pace.
What you need is a system that can present dynamic
information. And ASP fits the bill perfectly.

What is Active Server Pages?


Active Server Pages (ASPs) are Web pages that contain
server-side scripts in addition to the usual mixture of text
and HTML tags. Server-side scripts are special commands you
put in Web pages that are processed before the pages are
sent from the server to the web-browser of someone who's
visiting your website. When you type a URL in the Address
box or click a link on a webpage, you're asking a web-server
on a computer somewhere to send a file to the web-browser
(also called a "client") on your computer. If that file is a
normal HTML file, it looks the same when your web-browser
receives it as it did before the server sent it. After
receiving the file, your web-browser displays its contents
as a combination of text, images, and sounds.
In the case of an Active Server Page, the process is
similar, except there's an extra processing step that takes
place just before the server sends the file. Before the
server sends the Active Server Page to the browser, it runs
all server-side scripts contained in the page. Some of these
scripts
display
the
current
date,
time,
and
other
information. Others process information the user has just
typed into a form, such as a page in the website's
guestbook. And you can write your own code to put in
whatever dynamic information you want. To distinguish Active
Server Pages from normal HTML pages, Active Server
Pages are given the ".asp" extension.

Requirements to run ASP


Since the server must do additional processing on the ASP
scripts, it must have the ability to do so. The only servers
which
support
this
facility
are
Microsoft
Internet
Information Services & Microsoft Personal Web Server. Let us
look at both in detail, so that you can decide which one is
most suitable for you.
Internet Information Services
This is Microsofts web server designed for the Windows NT
platform. It can only run on Microsoft Windows NT 4.0,
Windows 2000 Professional, & Windows 2000 Server. The
current version is 5.0, and it ships as a part of the
Windows 2000 operating system.
Personal Web Server
This is a stripped-down version of IIS and supports most of
the features of ASP. It can run on all Windows platforms,
including Windows 95, Windows 98 & Windows Me. Typically,
ASP developers use PWS to develop their sites on their own
machines and later upload their files to a server running
IIS. If you are running Windows 9x or Me, your only option
is to use Personal Web Server 4.0.

The Object Model


ASP is a scripting environment revolving around its Object
Model. An Object Model is simply a hierarchy of objects that
you may use to get services from. In the case of ASP, all
commands are issued to certain inbuilt objects, that
correspond to the Client Request, Client Response, the
Server, the Session & the Application respectively. All of
these are for global use

Request: To get information from the user


Response: To send information to the user
Server: To control the Internet Information Server
Session: To store information about and change settings for
the user's current Web-server session
Application: To share application-level
information and control settings
for the lifetime of the application
The Request and Response objects contain collections (bits
of information that are accessed in the same way). Objects
use methods to do some type of procedure (if you know any
object-oriented programming language, you know already what
a method is) and properties to store any of the object's
attributes (such as color, font, or size).

Design Analysis / Implementation Logic:

Implementation:
Database Connectivity
<HTML>
<HEAD>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (ADODB.Connection)
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
C:\Databases\Students.mdb)
Dim RS
Set RS = Server.CreateObject (ADODB.Recordset)
RS.Open SELECT * FROM Students, DB
%>
</BODY>
</HTML>
The first few lines are the opening HTML tags for any page.
Theres no ASP code within them. The ASP block begins with
the statement,
Dim DB
which is a declaration of the variable that we are going to
use later on. The second line,
Set DB = Server.CreateObject (ADODB.Connection)
does the following two things:
Firstly,
the
right-hand-side
statement,
Server.CreateObject() is used to create an instance of a COM
object which has the ProgID ADODB.Connection. The Set
Statement then assigns this reference to our variable, DB.
Now, we use the object just created to connect to the
database using a Connection String.
The string,
"PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +

C:\Databases\Students.mdb
is a string expression that tells our object where to locate
the database, and more importantly, what type the database
is whether it is an Access database, or a Sybase database,
or else, is it Oracle. (Please note that this is a
Connection String specific to Access 2000 databases. This
example does not use ODBC.)
If the DB.Open statement succeeds without an error, we have
a valid connection to our database under consideration. Only
after this can we begin to use the database.
The immediate next lines,
Dim RS
Set RS = Server.CreateObject (ADODB.Recordset)
serve the same purpose as the lines for
ADODB.Connection object.
Only now
were
ADODB.Recordset! Now,

creating the
creating
an

RS.Open SELECT * FROM Students, DB


is perhaps the most important line of this example. Given an
SQL statement, this line executes the query, and assigns the
records returned to our Recordset object. The bare-minimum
syntax, as you can see, is pretty straight-forward. Of
course, the Recordset.Open (...) method takes a couple of
more arguments, but they are optional, and would just
complicate things at this juncture.
Inserting Data into a Table
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (ADODB.Connection)
DB.Mode = adModeReadWrite
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
C:\Databases\Students.mdb)
Dim RS
Set RS = Server.CreateObject (ADODB.Recordset)
RS.Open Students, DB, adOpenStatic, adLockPessimistic

RS.AddNew
RS (FirstName) = Kavitha
RS (LastName) = Nair
RS (Email) = kavitha@kavithanair.com
RS (DateOfBirth) = CDate(4 Feb, 1980)
RS.Update
%>
</BODY>
</HTML>

Updating Records
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (ADODB.Connection)
DB.Mode = adModeReadWrite
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
C:\Databases\Students.mdb)
Dim RS
Set RS = Server.CreateObject (ADODB.Recordset)
RS.Open
SELECT
*
FROM
Students
WHERE
FirstName
Kavitha, DB, adOpenStatic, adLockPessimistic
RS (Email) = mynewemail@kavithanair.com
RS (DateOfBirth) = CDate(4 Feb, 1980)
RS.Update
%>
</BODY>
</HTML>
Deleting Records
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (ADODB.Connection)
DB.Mode = adModeReadWrite

DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
C:\Databases\Students.mdb)
DB.Execute (DELETE * FROM Students WHERE FirstName =
Kavitha)
%>
</BODY>
</HTML>

Retrieving Data
<HTML>
<HEAD>
<TITLE>Student Records</TITLE>
</HEAD>
<BODY>
<%
Dim DB
Set DB = Server.CreateObject (ADODB.Connection)
DB.Open("PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=" +
C:\Databases\Students.mdb)
Dim RS
Set RS = Server.CreateObject (ADODB.Recordset)
RS.Open SELECT * FROM Students, DB
If RS.EOF And RS.BOF Then
Response.Write There are 0 records.
Else
RS.MoveFirst
While Not RS.EOF
Response.Write RS.Fields (FirstName)
Response.Write RS.Fields (LastName)
Response.Write <HR>
RS.MoveNext
Wend
End If
%>
</BODY>
</HTML>

Testing:
1. Insert data into Student Database.
2. Establish the connectivity with the database
3. Insert

record,

Delete

Record

Update

Record

and

retrieve data from the database.

Conclusion:
A

web

based

application

for

student

registration

is

implemented with ASP. The application also performs adding


new student, deleting a student and modifying a students
record .

Assignment No.4
Title of Assignment:
To create a simple multi-dimensional cube.
Relevant Theory / Literature Survey:
Installation of
Analysis Services Of MSSQL 2000 is the
primary requirement. When we installed MSSQL 2000 Analysis
Services, Analysis Manager was also installed as a tool .
What is a Cube?
Cubes are the main objects in online analytic processing
(OLAP), a technology that provides fast access to data in a
data warehouse. A Cube
is a set of data that is usually
constructed from a subset of a data warehouse and is
organized and summarized into a multidimensional structure
defined by a set of
dimensions and measures. A Cube
provides an easy-to-use mechanism for querying data with
quick and uniform response times.
Every cube has a schema, which is the set of joined tables
in the data warehouse from which the cube draws its source

data. The central table in the schema is the fact table, the
source of the cube's measures. The other tables are
dimension tables, the sources of the cube's dimensions.
A cube is defined by the measures and dimensions that it
contains. For example, a cube for sales analysis includes
the
measures
Item_Sale_Price
and
Item_Cost
and
the
dimensions Store_Location, Product_Line, and Fiscal_Year.
This cube enables end users to separate Item_Sale_Price and
Item_Cost
into
various
categories
by
Store_Location,
Product_Line, and Fiscal_Year.
Each cube dimension can contain a hierarchy of levels to
specify the categorical breakdown available to end users.
For example, the Store_Location dimension includes the level
hierarchy: Continent, Country, Region, State_Province, City,
Store_Number. Each level in a dimension is of finer
granularity than its parent. For example, continents contain
countries,
and
states
or
provinces
contain
cities.
Similarly, the hierarchy of the Fiscal_Year dimension
includes the levels Year, Quarter, Month, and Day.

Dimension levels are a powerful data modeling tool because


they allow end users to ask questions at a high level and
then expand a dimension hierarchy to reveal more detail.
Cubes are immediately subordinate to the database in the
object hierarchy. A database is a container for related and
cubes the objects they share. You must create a database
before you create a cube.
Data warehousing Objects
Fact tables and dimension tables are the two types of
objects commonly used in dimensional data warehouse schemas.
Fact tables are the large tables in your warehouse schema
that store business measurements. Fact tables typically
contain facts and foreign keys to the dimension tables. Fact
tables represent data, usually numeric and additive, that
can be analyzed and examined.
Dimension tables, also known as lookup or reference tables,
contain the relatively static data in the warehouse.

Dimension tables store the information you normally use to


contain queries
Star Schema
The star schema is the simplest data warehouse schema. It is
called a star schema because the diagram resembles a star,
with points radiating from a center. The center of the star
consists of one or more fact tables and the points of the
star are the dimension tables.
Hierarchies
Hierarchies are logical structures that use ordered levels
as a means of organizing data. A hierarchy can be used to
define data aggregation.
Design Analysis / Implementation Logic:

The assignment includes


1. Prepare Analysis Services, as our environment, for the
cube model we intend to design;
2. Create the basic cube model;
3. Perform dimension design and other steps as part of the
cube creation process;
4. Save the model;
5. Design storage for the cube we have planned;
6. Process the cube and
7. Overview basic cube browse functionality.
Testing:
(Input/ Output):

Conclusion:
A simple multi-dimensional cube is created and studied

Assignment No.5
Title of Assignment:
Study OF LDAP (Light weight Directory Access Protocol)
Relevant Theory / Literature Survey:

Directory Service
A Directory is like a database: you can put information in,
and later retrieve it. But it is specialized. Some typical
characteristics are: designed for reading more than writing,
offers a static view of the data, simple updates without
transactions. Directories are tuned to give quick-response
to high-volume lookup or search operations.
A Directory Service sports all of the above, plus a network
protocol used to access the directory. And perhaps also a
replication scheme, a data distribution scheme.
The Lightweight Directory Access Protocol (LDAP) is a
protocol for accessing online directory services. It runs
directly over TCP, and can be used to access directory
services back-ended by X.500, standalone LDAP directory
services or other kinds of directory servers.
X500
LDAP was originally developed as a front end to X.500, the
OSI directory service. X.500 defines the Directory Access
Protocol (DAP) for clients to use when contacting directory
servers. DAP is a heavyweight protocol that runs over a full
OSI stack and requires a significant amount of computing
resources to run. LDAP runs directly over TCP and provides
most of the functionality of DAP at a much lower cost. This
use of LDAP makes it easy to access the X.500 directory.
X500 in more depth
In X.500, the namespace is explicitly stated and is
hierarchical. Such namespaces require relatively complicated
management schemes. The naming model defined in X.500 is
concerned mainly with the structure of the entries in the
namespace, not the way the information is presented to the
user. Every entry in a X.500 Directory Information Tree, or
DIT, is a collection of attributes, each attribute composed
of a type element and one or more value elements.
The X.500 standard defines 17 object classes for directories
as a baseline. Being extensible, X.500 directories may
include other objects defined by implementors. The 17 basic
object classes include:

Alias
Country
Locality
Organization
Organizational Unit
Person

Objects in these object classes are defined by their


attributes. Some of the basic 40 attribute types include:

Common Name (CU)


Organization Name (O)
Organizational Unit Name (OU)
Locality Name (L)
Street Address (SA)
State or Province Name (S)
Country (C)

Putting this all together, an unambiguous entry for an


addressee would be specified by its distinguished name, say
{C=US, O=Acme, OU=Sales, CN=Fred}
Sample X.500 hierarchy. Starting at the highest level, or
Root, we can traverse the tree to successively lower levels,
called Country, Organization, and Common Name, for instance.
Applications and users access the directory via a directory
user agent, or DUA. A DUA transfers the directory request to
a DSA, or Directory System Agent, via DAP, the Directory
Access Protocol. The directory itself is composed of one or
more DSAs. The DSAs can either communicate among themselves
to share directory information or may perform what is called
a referral, i.e., direct the DUA to use a specific DSA.
Referrals may occur when DSAs are not set up to exchange
directory information, perhaps due to lack of interworking
agreements between the administrators, or for security
reasons.
LDAP
The LDAP standard defines

a network protocol for accessing information in the


directory. It defines the operations one may perform
e.g. search, add, delete, modify, change name. It also
defines how operations and data are conveyed.
an information model defining the form and character of

the information
a namespace defining how information is referenced and
organized
an emerging distributed operation model defining how
data may be distributed and referenced (v3)
Both the protocol itself and the information model are
extensible

Data Types
Any data types can be into the directory: Text, Photos,
URLs, Pointers to whatever, Binary data, Public Key
certificates.
Different types of data are held in attributes of different
types. Each attribute type has a particular syntax. The LDAP
standard describes a rich set of standard attribute types
and syntax (based on X.500's set). Plus, you may define your
own attributes, syntax, and even object classes -- you can
tailor your directory to your own site's specific needs.
The information model and namespace
They are based on Entries. An entry is simply a place where
one stores attributes. Each attribute has a type and one or
more values.
Entries themselves are "typed". This is accomplished by the
objectClass attribute.
The namespace is hierarchical, so it has the concept of
fully-qualified names called Distinguished Names (DN).

Here, test Entry's


dc=stanford, dc=edu"

DN

is

"cn=test

entry,

ou=people,

Accessing an LDAP-based directory is accomplished by using a


combination of DN, filter, and scope. A base DN indicates
where in the hierarchy to begin the search. A filter
specifies attribute types, assertion values, and matching
criteria. A scope indicates what to search: the base DN
itself, one level below the base DN, the entire sub-tree
rooted at the base DN.
How does LDAP work?
LDAP directory service is based on a client-server model.
One or more LDAP servers contain the data making up the LDAP
directory tree. An LDAP client connects to an LDAP server
and asks it a question. The server responds with the answer,
or with a pointer to where the client can get more
information (typically, another LDAP server). No matter
which LDAP server a client connects to, it sees the same
view of the directory; a name presented to one LDAP server
references the same entry it would at another LDAP server.
This is an important feature of a global directory service,
like LDAP.
Key Points of LDAP:

LDAP is an extensive, vendor-independent, open, network


PROTOCOL
standard:
so
accessing
data
is
done
transparently across a highly heterogeneous network
(i.e. the Internet).
An LDAP-based directory supports any type of data.
Can
configure
an
LDAP-based
directory
to
play
essentially any role.
The LDAP protocol directly supports various forms of
strong
security
(authentication,
privacy,
and
integrity) technology.
Can use general-purpose directory technology, such as
LDAP, to glue together disparate facets of cyberspace,
e.g.
email,
security,
white&
yellow-pages,
directories, collaborative tools, MBone, etc.

Induvidual LDAP records

What's in a name? The DN of an LDAP entry


All entries stored in an LDAP directory have a unique
"Distinguished Name," or DN. The DN for each LDAP entry is
composed of two parts: the Relative Distinguished Name (RDN)
and the location within the LDAP directory where the record
resides.
The RDN is the portion of your DN that is not related to the
directory tree structure. Most items that you'll store in an
LDAP directory will have a name, and the name is frequently
stored in the cn (Common Name) attribute. Since nearly
everything has a name, most objects you'll store in LDAP
will use their cn value as the basis for their RDN. If I'm
storing a record for my favorite oatmeal recipe, I'll be
using cn=Oatmeal Deluxe as the RDN of my entry.

My directory's base DN is dc=foobar,dc=com


I'm storing all the LDAP records for my recipes in
ou=recipes

The RDN of my LDAP record is cn=Oatmeal Deluxe

Given all this, what's the full DN of the LDAP record for
this oatmeal recipe? Remember, it reads backwards - just
like a host name in DNS.
cn=Oatmeal Deluxe,ou=recipes,dc=foobar,dc=com

Now it's time to tackle the DN of a company employee. For


user accounts, you'll typically see a DN based either on the
cn or on the uid (User ID). For example, the DN for FooBar's
employee Fran Smith (login name: fsmith) might look like
either of these two formats:
uid=fsmith,ou=employees,dc=foobar,dc=com

(login-based)
LDAP (and X.500) use uid to mean "User ID", not to be
confused with the UNIX uid number. Most companies try to
give everyone a unique login name, so this approach makes
good sense for storing information about employees. You
don't have to worry about what you'll do when you hire the
next Fran Smith, and if Fran changes her name (marriage?
divorce? religious experience?), you won't have to change
the DN of the LDAP entry.
cn=FranSmith,ou=employees,dc=foobar,dc=com

(name-based)
Here we see the Common Name (CN) entry used. In the case of
an LDAP record for a person, think of the common name as
their full name. One can easily see the downside to this
approach: if the name changes, the LDAP record has to "move"
from one DN to another. As indicated above, you want to
avoid changing the DN of an entry whenever possible.
An example of an induvidual LDAP entry.
Let's look at an example. We'll use the LDAP record of Fran
Smith, an employee from Foobar, Inc. The format of this
entry is LDIF, the format used when exporting and importing
LDAP directory entries.
dn: uid=fsmith, ou=employees, dc=foobar, dc=com
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
objectclass: foobarPerson
uid: fsmith
givenname: Fran
sn: Smith
cn: Fran Smith
cn: Frances Smith
telephonenumber: 510-555-1234
roomnumber: 122G
o: Foobar, Inc.
mailRoutingAddress: fsmith@foobar.com
mailhost: mail.foobar.com
userpassword: {crypt}3x1231v76T89N
uidnumber: 1234
gidnumber: 1200
homedirectory: /home/fsmith
loginshell: /usr/local/bin/bash
To start with, attribute values are stored with case intact,
but searches against them are case-insensitive by default.
Certain attributes (like password) are case-sensitive when
searching.
Let's break this entry down and look at it piece by piece.
dn: uid=fsmith, ou=employees, dc=foobar, dc=com
This is the full DN of Fran's LDAP entry, including the
whole path to the entry in the directory tree. LDAP (and

X.500) use uid to mean "User ID," not to be confused with


the UNIX uid number.
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
objectclass: foobarPerson
One can assign as many object classes as are applicable to
any given type of object. The person object class requires
that the cn (common name) and sn (surname) fields have
values. Object Class person also allows other optional
fields, including givenname, telephonenumber, and so on. The
object class organizationalPerson adds more options to the
values from person, and inetOrgPerson adds still more options
to that (including email information). Finally, foobarPerson
is Foobar's customized object class that adds all the custom
attributes they wish to track at their company.
uid: fsmith
givenname: Fran
sn: Smith
cn: Fran Smith
cn: Frances Smith
telephonenumber: 510-555-1234
roomnumber: 122G
o: Foobar, Inc.
As mentioned before, uid stands for User ID. Just translate
it in your head to "login" whenever you see it.
Note that there are multiple entries for the CN. As
mentioned above, LDAP allows some attributes to have
multiple values, with the number of values being arbitrary.
When would you want this? Let's say you're searching the
company LDAP directory for Fran's phone number. While you
might know her as Fran (having heard her spill her guts over
lunchtime margaritas on more than one occasion), the people
in HR may refer to her (somewhat more formally) as Frances.
Because both versions of her name are stored, either search
will successfully look up Fran's telephone number, email,
cube number, and so on.
mailRoutingAddress: fsmith@foobar.com
mailhost: mail.foobar.com
Like most companies on the Internet, Foobar uses Sendmail
for internal mail delivery and routing. Foobar stores all

users' mail routing information in LDAP, which is fully


supported by recent versions of Sendmail.
userpassword: {crypt}3x1231v76T89N
uidnumber: 1234
gidnumber: 1200
gecos: Frances Smith
homedirectory: /home/fsmith
loginshell: /usr/local/bin/bash
Note that Foobar's systems administrators store all the
password map information in LDAP as well. At Foobar,
foobarPerson object class adds this capability. Note that
user password is stored in UNIX crypt format. The UNIX
is stored here as uidnumber.

NIS
the
the
uid

Conclusion:
Thus, the Light weight Directory
studied.

Access Protocol is

Assignment No 6 (a)
Title: Case Study of SQL SERVER
What is SQL Server?
SQL Server 2000 is a family of products designed to meet the
data storage requirements of large data processing systems
and commercial Web sites, as well as meet the ease-of-use
requirements of individuals and small businesses. At its
core, SQL Server 2000 provides two fundamental services to
the emerging Microsoft .NET platform, as well as in the
traditional two-tier client/server environment. The first
service is the SQL Server service, which is a highperformance, highly scalable relational database engine. The
second service is SQL Server 2000 Analysis Services, which
provides tools for analyzing the data stored in data
warehouses and data marts for decision support.

Microsoft SQL Server is a complete database and analysis


solution for rapidly delivering the next generation of
scalable Web applications.
SQL Server is a key component in supporting e-commerce,
line-of-business, and data warehousing applications, while
offering the scalability necessary to support growing,
dynamic environments.
SQL Server includes rich support for Extensible Markup
Language (XML) and other Internet language formats;
performance and availability features to ensure uptime; and
advanced management and tuning functionality to automate
routine tasks and lower the total cost of ownership.

The SQL Server 2000 Environment


The traditional client/server database environment consists
of client applications and a relational database management
system (RDBMS) that manages and stores the data. In this
traditional environment, the client applications that
provide the interface for users to access SQL Server 2000
are intelligent (or thick) clients, such as custom-written
Microsoft Visual Basic programs that access the data on SQL
Server 2000 directly using a local area network.
The emerging Microsoft .NET platform consists of highly
distributed, loosely connected, programmable Web services
executing on multiple servers. In this distributed,
decentralized environment, the client applications are thin
clients, such as Internet browsers, which access the data on
SQL Server 2000 through Web services such as Microsoft
Internet Information Services (IIS).

SQL Server 2000 Components


SQL Server 2000 provides a number of different types of
components. At the core are server components. These server
components are generally implemented as 32-bit Windows
services. SQL Server 2000 provides client-based graphical
tools and command-prompt utilities for administration. These
tools and utilities, as well as all other client
applications, use client communication components provided
by SQL Server 2000. The communication components provide
various ways in which client applications can access data
through communication with the server components. These
communication components are implemented as providers,
drivers, database interfaces, and Net-Libraries.

Server Components

The server components of SQL Server 2000 are normally


implemented as 32-bit Windows services. The SQL Server and
SQL Server Agent services may also be run as standalone
applications on any supported Windows operating system
platform.
Table lists the server components and briefly describes
their function. It also specifies how the component is
implemented when multiple instances are used.
Table: Server Components and Their Functions
Server
Description
Component
MSSQLServer service implements the SQL Server
SQL
Server
2000 database engine. There is one service
service
for each instance of SQL Server 2000.
Microsoft
SQL
MSSQLServerOLAPService implements SQL Server
Server
2000
2000 Analysis Services. There is only one
Analysis
service,
regardless
of
the
number
of
Services
instances of SQL Server 2000.
service
SQLServerAgent service implements the agent
SQL
Server that
runs
scheduled
SQL
Server
2000
Agent service administrative tasks. There is one service
for each instance of SQL Server 2000.
Microsoft Search implements the full-text
Microsoft
search engine. There is only one service,
Search service regardless of the number of instances of SQL
Server 2000.
Distributed Transaction Coordinator manages
distributed transactions between instances of
Microsoft
(MS
SQL Server 2000. There is only one service,
DTC) service
regardless of the number of instances of SQL
Server 2000.

Client-Based Graphical Tools


Table lists the 32-bit graphical tools provided by SQL
Server 2000 and briefly describes their function.
Table: Graphical Tools in SQL Server 2000
Graphical
Description
Tool
SQL
Server SQL Server Enterprise Manager is the primary
Enterprise
administrative
tool
for
SQL
Server
and
Manager
provides a Microsoft Management Console (MMC)
compliant user interface that helps you to
perform a variety of administrative tasks:

Graphical
Tool

Description
Defining groups of servers running SQL Server
Registering individual servers in a group
Configuring all SQL Server options for each
registered server
Creating and administering all SQL Server
databases,
objects,
logins,
users,
and
permissions in each registered server
Defining
and
executing
all
SQL
Server
administrative tasks on each registered server
Designing and testing SQL statements, batches,
and scripts interactively by invoking SQL
Query Analyzer
Invoking the various wizards defined for SQL
Server
SQL Server SQL Query Analyzer is a graphical
tool that helps you to perform
a variety of
tasks:

Creating queries and other SQL scripts and


SQL
Query executing them against SQL Server databases
Analyzer
Creating commonly used database objects from
predefined scripts
Copying existing database objects
Executing stored procedures without knowing
the parameters
SQL Profiler is a tool that captures SQL
Server events
from a server. The events are saved in
a trace file that can later
SQL Profiler
be analyzed or used to replay a specific
series of steps when
trying to diagnose a problem.
SQL
Server
A taskbar application used to start,
Service
stop, pause, or modify SQL Server
Manager
2000 services.
SQL
Server SQL Server Agent runs on the server that is
Agent
running instances of SQL Server. SQL Server
Agent is responsible for the following tasks:
Running SQL Server tasks that are scheduled to

Graphical
Tool

Description
occur at specific times or intervals Detecting
specific conditions for which administrators
have defined an action, such as alerting
someone through pages or e-mail, or issuing a
task that will address the conditions
Running
replication
tasks
defined
by
administrators

The Relational Database Architecture


SQL Server 2000 data is stored in databases. Physically, a
database consists of two or more files on one or more disks.
This physical implementation is visible only to database
administrators, and is transparent to users. The physical
optimization of the database is primarily the responsibility
of the database administrator.
Logically, a database is structured into components that are
visible to users, such as tables, views, and stored
procedures. The logical optimization of the database (such
as the design of tables and indexes) is primarily the
responsibility of the database designer. ISBN 0-7356-0634X).

System and User Databases

Each instance of SQL Server 2000 has four system databases.


Table 1.6 lists each of these system databases and briefly
describes their function.
In addition, each instance of SQL Server 2000 has one or
more user databases. The pubs and Northwind user databases
are sample databases that ship with SQL Server 2000. Given
sufficient system resources, each instance of SQL Server
2000 can handle thousands of users working in multiple
databases simultaneously.
Table: System Databases in SQL Server 2000
System
Description
Database
Records all of the system-level information for a
SQL Server 2000 system, including all other
master
databases,
login
accounts,
and
system
configuration settings.
Stores all temporary tables and stored procedures
tempdb
created by users, as well as temporary worktables
used by the relational database engine itself.
Serves as the template that is used whenever a new
model
database is created.
SQL Server Agent uses this system database for
msdb
scheduling
alerts
and
jobs,
and
recording
operators.

Physical Structure of a Database


Each database consists of at least one data file and one
transaction log file. These files are not shared with any
other database. To optimize performance and to provide fault
tolerance, data and log files are typically spread across
multiple drives and frequently use a redundant array of
independent disks (RAID).
Extents and Pages
SQL Server 2000 allocates space from a data file for tables
and indexes in 64-KB blocks called extents. Each extent
consists of eight contiguous pages of 8 KB each. There are
two types of extents: uniform extents that are owned by a
single object, and mixed extents that are shared by up to
eight objects.
A page is the fundamental unit of data storage in SQL Server
2000, with the page size being 8 KB. In general, data pages
store data in rows on each data page. The maximum amount of
data contained in a single row is 8060 bytes. Data rows are
either organized in some kind of order based on a key in a

clustered index (such as zip code), or stored in no


particular order if no clustered index exists. The beginning
of each page contains a 96-byte header that is used to store
system information, such as the amount of free space
available on the page.

Transaction Log Files


The transaction log file resides in one or more separate
physical files from the data files and contains a series of
log records, rather than pages allocated from extents. To
optimize performance and aid in redundancy, transaction log
files are typically placed on separate disks from data
files, and are frequently mirrored using RAID.

Logical Structure of a Database


Data in SQL Server 2000 is organized into database objects
that are visible to users when they connect to a database.
Table lists these objects and briefly describes their
function.
Table: Database Objects in SQL Server 2000
Database
Description
Object
A table generally consists of columns and rows
of data in a format similar to that of a
spreadsheet. Each row in the table represents
Tables
a unique record, and each column represents a
field within the record. A data type specifies
what type of data can be stored in a column.
Views can restrict the rows or the columns of
a table that are visible, or can combine data
Views
from multiple tables to appear like a single
table. A view can also aggregate columns.
An index is a structure associated with a
table or view that speeds retrieval of rows
from the table or view. Table indexes are
Indexes
either clustered or nonclustered. Clustering
means the data is physically ordered based on
the index key.
A key is a column or group of columns that
uniquely identifies a row (PRIMARY KEY),
Keys
defines the relationship between two tables
(FOREIGN KEY), or is used to build an index.
User-defined A user-defined data type is a custom data
data types
type, based on a predefined SQL Server 2000
data type. It is used to make a table

Database
Object

Description

structure more meaningful to programmers and


help ensure that columns holding similar
classes of data have the same base data type.
A stored procedure is a group of Transact-SQL
Stored
statements compiled into a single execution
procedures
plan. The procedure is used for performance
optimization and to control access.
Constraints define rules regarding the values
Constraints
allowed in columns and are the standard
mechanism for enforcing data integrity.
A default specifies what values are used in a
column in the event that you do not specify a
Defaults
value for the column when you are inserting a
row.
A trigger is a special class of stored
procedure defined to execute automatically
Triggers
when an UPDATE, INSERT, or DELETE statement is
issued against a table or view.
A user-defined function is a subroutine made
up of one or more Transact-SQL statements used
User-defined to encapsulate code for reuse. A function can
functions
have a maximum of 1024 input parameters. Userdefined functions can be used in place of
views and stored procedures.

The Security Architecture


Logins, users, roles, and groups are the foundation for the
security mechanisms of SQL Server. Users who connect to SQL
Server must identify themselves by using a Specific Login
Identifier (ID). Users can then see only the tables and
views that they are authorized to see and can execute only
the stored procedures and administrative functions that they
are authorized to execute. This system of security is based
on the IDs used to identify users.

Allocating Space for Tables and Indexes


Before SQL Server 2000 can store information in a table or
an index, free space must be allocated from within a data
file and assigned to that object. Free space is allocated
for tables and indexes in units called extents. An extent is
64 KB of space, consisting of eight contiguous pages, each 8
KB in size. There are two types of extents, mixed extents

and uniform extents. SQL Server 2000 uses mixed extents to


store small amounts of data for up to eight objects within a
single extent and uses uniform extents to store, whereas SQL
Server 2000 uses uniform extents to store data from a single
object.
When a new table or index is created, SQL Server 2000
locates a mixed extent with a free page and allocates the
free page to the newly created object. A page contains data
for only one object. When an object requires additional
space, SQL Server 2000 allocates free space from mixed
extents until an object uses a total of eight pages.
Thereafter, SQL Server 2000 allocates a uniform extent to
that object. SQL Server 2000 will grow the data files in a
round-robin algorithm if no free space exists in any data
file and autogrow is enabled.
When SQL Server 2000 needs a mixed extent with at least one
free page, a Secondary Global Allocation Map (SGAM) page is
used to locate such an extent. Each SGAM page is a bitmap
covering 64,000 extents (approximately 4 GB) that is used to
identify allocated mixed extents with at least one free
page. Each extent in the interval that SGAM covers is
assigned a bit. The extent is identified as a mixed extent
with free pages when the bit is set to 1. When the bit is
set to 0, the extent is either a mixed extent with no free
pages, or the extent is a uniform extent.
When SQL Server 2000 needs to allocate an extent from free
space, a Global Allocation Map (GAM) page is used to locate
an extent that has not previously been allocated to an
object. Each GAM page is a bitmap that covers 64,000
extents, and each extent in the interval it covers is
assigned a bit. When the bit is set to 1, the extent is
free. When the bit is set to 0, the extent has already been
allocated.

Storing Index and Data Pages


In the absence of a clustered index, SQL Server 2000 stores
new data on any unfilled page in any available extent
belonging to the table into which the data is being
inserted. This disorganized collection of data pages is
called a heap. In a heap, the data pages are stored in no
specific order and are not linked together. In the absence
of either a clustered or a nonclustered index, SQL Server
2000 has to search the entire table to locate a record
within the table (using IAM pages to identify pages

associated with the table). On a large table, this complete


search is quite inefficient.
To speed this retrieval process, database designers create
indexes for SQL Server 2000 to use to find data pages
quickly. An index stores the value of an indexed column (or
columns) from a table in a B-tree structure. A B-tree
structure is a balanced hierarchal structure (or tree)
consisting of a root node, possible intermediate nodes, and
bottom-level leaf pages (nodes). All branches of the B-tree
have the same number of levels. A B-tree physically
organizes index records based on these key values. Each
index page is linked to adjacent index pages.
SQL Server 2000 supports two types of indexes, clustered and
nonclustered. A clustered index forces the physical ordering
of data pages within the data file based on the key value
used for the clustered index (such as last name or zip
code). The leaf level of a clustered index is the data
level. When a new data row is inserted into a table
containing a clustered index, SQL Server 2000 traverses the
B-tree structure and determines the location for the new
data row based on the ordering within the B-tree (moving
existing data and index rows as necessary to maintain the
physical ordering). See Figure 5.1.
The leaf level of a nonclustered index contains a pointer
telling SQL Server 2000 where to find the data row
corresponding to the key value contained in the nonclustered
index. When a new data row is inserted into a table
containing only a nonclustered index, a new index row is
entered into the B-tree structure, and the new data row is
entered into any page in the heap that has been allocated to
the table and contains sufficient free space. See Figure
5.2.

ASSIGNMENT NO. 6b

TITLE: CASE STUDY of MYSQL


THEORY:
History
Overview of MySQL AB
MySQL AB is the company of the MySQL founders and main
developers. MySQL AB was originally established in Sweden by
David Axmark, Allan Larsson, and Michael Monty Widenius.
The MySQL Web site (http://www.mysql.com/)
latest information about MySQL and MySQL AB.

provides

the

The AB part of the company name is the acronym for the


Swedish aktiebolag, or stock company. It translates to
MySQL, Inc.
Overview of the MySQL Database Management System
MySQL, the most popular Open Source SQL database management
system uses the standard SQL interface. It is developed,
distributed, and supported by MySQL AB. MySQL is very
popular as a back end for web applications. The MySQL engine
can be accessed from most major programming/scripting
languages such as perl, and php making it easy to develop
applications.
- MySQL is a database management system.
- MySQL is a relational database management system.
- MySQL software is Open Source.
- The MySQL Database Server is very fast, reliable, and easy
to use.
- MySQL Server works in client/server or embedded systems.
- A large amount of contributed MySQL software is available.
The Main Features of MySQL
Internals and Portability:
Written in C and C++.
Tested with a broad range of different compilers.
Works on many different platforms.
APIs for C, C++, Java, Perl, PHP, Python, etc.
available.

are

Fully multi-threaded using kernel threads. It can easily


use multiple CPUs if they are available.
Provides transactional and non-transactional storage
engines.
Uses very fast B-tree disk tables (MyISAM) with index
compression.
Relatively easy to add other storage engines. This is
useful if you want to add an SQL interface to an in-house
database.
A very fast thread-based memory allocation system.
Very fast joins using an optimized one-sweep multi-join.
In-memory hash tables, which are used as temporary tables.
SQL functions are implemented using a highly optimized
class library and should be as fast as possible. Usually
there is no memory allocation at all after query
initialization.
The MySQL code is tested with Purify (a commercial memory
leakage detector) as well as with Valgrind, a GPL tool.
The server is available as a separate program for use in a
client/server networked environment. It is also available as
a library that can be embedded (linked) into standalone
applications. Such applications can be used in isolation or
in environments where no network is available.
Data Types:
Many data types: signed/unsigned integers 1, 2, 3, 4, and
8 bytes long, FLOAT, DOUBLE, CHAR, VARCHAR, TEXT, BLOB,
DATE, TIME, DATETIME, TIMESTAMP, YEAR, SET, ENUM, and
OpenGIS spatial types.
Fixed-length and variable-length records.
Statements and Functions:
Full operator and function support in the SELECT and WHERE
clauses of queries. For example:
mysql> SELECT CONCAT(first_name, ' ', last_name)
-> FROM citizen
-> WHERE income/dependents > 10000 AND age > 30;
Full support for SQL GROUP BY and ORDER BY clauses.
Support for group functions (COUNT(), COUNT(DISTINCT ...),
AVG(), STD(), SUM(), MAX(), MIN(), and GROUP_CONCAT()).
Support for LEFT OUTER JOIN and RIGHT OUTER JOIN with both
standard SQL and ODBC syntax.
Support for aliases on tables and columns as required by
standard SQL.
DELETE, INSERT, REPLACE, and UPDATE return the number of
rows that were changed (affected). It is possible to return

the number of rows matched instead by setting a flag when


connecting to the server.
The MySQL-specific SHOW statement can be used to retrieve
information about databases, storage engines, tables, and
indexes. The EXPLAIN statement can be used to determine how
the optimizer resolves a query.
Function names do not clash with table or column names.
For example, ABS is a valid column name. The only
restriction is that for a function call, no spaces are
allowed between the function name and the ( that follows
it.
You can mix tables from different databases in the same
query.
Security:
A privilege and password system that is very flexible and
secure, and that allows host-based verification. Passwords
are secure because all password traffic is encrypted when
you connect to a server.
Scalability and Limits:
Handles large databases. We use MySQL Server with
databases that contain 50 million records. We also know of
users who use MySQL Server with 60,000 tables and about
5,000,000,000 rows.
Up to 64 indexes per table are allowed (32 before MySQL
4.1.2). Each index may consist of 1 to 16 columns or parts
of columns. The maximum index width is 1000 bytes (767 for
InnoDB); before MySQL 4.1.2, the limit is 500 bytes. An
index may use a prefix of a column for CHAR, VARCHAR, BLOB,
or TEXT column types.
Connectivity:
Clients can connect to the MySQL server using TCP/IP
sockets on any platform. On Windows systems in the NT family
(NT, 2000, XP, 2003, or Vista), clients can connect using
named pipes. On Unix systems, clients can connect using Unix
domain socket files.
In MySQL 4.1 and higher, Windows servers also support
shared-memory connections if started with the --sharedmemory option. Clients can connect through shared memory by
using the --protocol=memory option.
The Connector/ODBC (MyODBC) interface provides MySQL
support for client programs that use ODBC (Open Database
Connectivity) connections. For example, you can use MS
Access to connect to your MySQL server. Clients can be run

on Windows or Unix. MyODBC source is available. All ODBC 2.5


functions are supported, as are many others.
The Connector/J interface provides MySQL support for Java
client programs that use JDBC connections. Clients can be
run on Windows or Unix. Connector/J source is available.
MySQL Connector/NET enables developers to easily create
.NET applications that require secure, high-performance data
connectivity with MySQL. It implements the required ADO.NET
interfaces and integrates into ADO.NET aware tools.
Developers can build applications using their choice of .NET
languages. MySQL Connector/NET is a fully managed ADO.NET
driver written in 100% pure C#.
Localization:
The server can provide error messages to clients in many
languages. See Section 5.11.2, Setting the Error Message
Language.
Full support for several different character sets,
including latin1 (cp1252), german, big5, ujis, and more. For
example, the Scandinavian characters , and are
allowed in table and column names. Unicode support is
available as of MySQL 4.1.
All data is saved in the chosen character set. All
comparisons for normal string columns are case-insensitive.
Sorting is done according to the chosen character set
(using Swedish collation by default). It is possible to
change this when the MySQL server is started. To see an
example of very advanced sorting, look at the Czech sorting
code. MySQL Server supports many different character sets
that can be specified at compile time and runtime.
Clients and Tools:
MySQL Server has built-in support for SQL statements to
check, optimize, and repair tables. These statements are
available from the command line through the mysqlcheck
client. MySQL also includes myisamchk, a very fast commandline utility for performing these operations on MyISAM
tables.
All MySQL programs can be invoked with the --help or -?
options to obtain online assistance.

System Architecture

Transaction Management
Transaction Overview
Transaction - A sequence of executions of SQL
statements that can be treated as a single unit in which all
data changes can be committed or cancelled as a whole.

Most database servers offer two transaction management


modes:
Auto Commit On: Each SQL statement is a transaction.
Data
changes
resulted
from
each
statement
are
automatically committed.
Auto Commit Off: Transactions are explicitly started
and ended by the client program. Data changes are not
committed unless requested by the client program.
Most database server supports the following statements
for transaction management:
Commit Statement - To commit all changes in the current
transaction.
Rollback Statement - To rollback all changes in the
current transaction.
Start
Transaction
Statement
To
start
a
new
transaction.
Transactions are not explicitly started on the storage
engine level, but are instead implicitly started through
calls to either start_stmt() or external_lock(). If the
preceding methods are called and a transaction already
exists the transaction is not replaced.
The storage engine stores transaction information in
per-connection memory and also registers the transaction in
the MySQL server to allow the server to later issue COMMIT
and ROLLBACK operations.
As operations are performed the storage engine will
have to implement some form of versioning or logging to
permit a rollback of all operations executed within the
transaction.
After work is completed, the MySQL server will call
either the commit() method or the rollback() method defined
in the storage engine's handlerton.
MySQL Support of Transaction Management
MySQL support
following rules:

of

transaction

management

follows

the

Only
two
storage
engines
support
transaction
management: InnoDB and BDB.
The default storage engine, MyISAM, doesn't support
transaction management.
To force a table to use a non-default storage engine,
you must specify the engine name in the "create table"
statement.

Statements related to transaction management:

SET AUTOCOMMIT = 0 | 1;
START TRANSACTOIN;
COMMIT;
ROLLBACK;
Note that:

SET AUTOCOMMIT = 1 - Turns on the auto-commit option.


It also commits and terminates the current transaction.
SET AUTOCOMMIT = 0 - Turns off the auto-commit option.
It also starts a new transaction
By default, auto-commit option is turned on when a new
session is established.
COMMIT - Commits the current transaction.
ROLLBACK - Rolls back the current transaction.
START TRANSACTION - Commits the current transaction and
starts a new transaction.

Transaction Isolation Levels


The impact of a transaction in the current session is
simple.
However,
concurrent
transactions
in
multiple
sessions may impact each other in many ways. Three phenomena
have been observed in concurrent transactions:

Dirty Read - One transaction T1 reads uncommitted


changes from another transaction T2. If T2 performs a
rollback later, T1 may have used incorrect data from
the uncommitted changes.
Non-Repeatable Read - One transaction T1 reads a row,
which is changed and committed by another transaction
T2 later. Now if T1 reads the same row again, the
result will be will be different from the first read.
Phantom - One transaction T1 reads a set of rows that
satisfy a condition. Another transaction T2 then
inserts some new rows that satisfy the same condition.
If T1 repeats the same read, it will receive some
"phantom" rows.

To be able to control and avoid those phenomena, 4


transaction isolation levels have been defined by SQL
standards:

Read Uncommitted - This is the lowest isolation level.


All three phenomena are possible.

Read Committed - Dirty Read is prevented. But NonRepeatable Read and Phantom are possible.
Repeatable Read - Dirty Read and Non-Repeatable Read
are prevented. But Phantom is still possible.
Serializable - This is the highest isolation level. All
three phenomena are prevented.

MySQL Support of Transaction Isolation Levels


Transaction isolation levels are supported by the
InnoDB storage engine.
The default isolation level is "Repeatable Read".
The SET statement can be used to change the isolation
level for the next transaction: "SET TRANSACTION
ISOLATION LEVEL level_name".
The SET statement can be used to change the isolation
level for the entire session, starting with the next
transaction: "SET SESSION TRANSACTION ISOLATION LEVEL
level_name".

Starting a Transaction
A transaction is started by the storage engine in
response to a call to either the start_stmt() or
external_lock() methods.
If there is no active transaction, the storage engine
must start a new transaction and register the transaction
with the MySQL server so that ROLLBACK or COMMIT can later
be called.
Implementing ROLLBACK
Of the two major transactional operations, ROLLBACK is
the more complicated to implement. All operations that
occurred during the transaction must be reversed so that all
rows are unchanged from before the transaction began.
To support ROLLBACK, create a method that matches this
definition:
int (*rollback)(THD *thd, bool all);
The method name is then listed in the rollback
(thirteenth) entry of the handlerton.
The THD parameter is used to identify the transaction
that needs to be rolled back, while the bool all parameter
indicates whether the entire transaction should be rolled
back or just the last statement.

Details of implementing a ROLLBACK operation will vary


by storage engine.
Implementing COMMIT
During a commit operation, all changes made during a
transaction are made permanent and a rollback operation is
not possible after that. Depending on the transaction
isolation used, this may be the first time such changes are
visible to other threads.
To support COMMIT, create a method that matches this
definition:
int (*commit)(THD *thd, bool all);
The method name is then listed in the commit (twelfth)
entry of the handlerton.
The THD parameter is used to identify the transaction
that needs to be committed, while the bool all parameter
indicates if this is a full transaction commit or just the
end of a statement that is part of the transaction.
Details of implementing a COMMIT operation will vary by
storage engine. If the server is in auto-commit mode, the
storage engine should automatically commit all read-only
statements such as SELECT. In a storage engine, "autocommitting" works by counting locks. Increment the count for
every
call
to
external_lock(),
decrement
when
external_lock() is called with an argument of F_UNLCK. When
the count drops to zero, trigger a commit.
Adding Support for Savepoints
This should be a fixed size, preferably not large as
the MySQL server will allocate space to store the savepoint
for all storage engines with each named savepoint.
When a COMMIT or ROLLBACK operation occurs (with bool
all set to true), all savepoints are assumed to be released.
If the storage engine allocates resources for savepoints, it
should free them.

Indexing and Storage


Indexing
Indexes are a special system that databases use to
improve the overall performance. By setting indexes on your
tables, you are telling MySQL to pay particular attention to

that column (in layman's terms). In fact, MySQL creates


extra files to store and track indexes efficiently.
MySQL allows for up to 32 indexes for each table, and
each index can incorporate up to 16 columns. While a
multicolumn index may not seem obvious, it will come in
handy for searches frequently performed on the same set of
multiple columns (e.g., first and last name, city and state,
etc.)
Indexes are used to find rows with specific column
values quickly. Without an index, MySQL must begin with the
first row and then read through the entire table to find the
relevant rows. The larger the table, the more this costs. If
the table has an index for the columns in question, MySQL
can quickly determine the position to seek to in the middle
of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster
than reading sequentially. If you need to access most of the
rows, it is faster to read sequentially, because this
minimizes disk seeks.
Indexes are a way to increase performance and
efficiency in a database table. If you have a table with
many columns but you are always doing searches on one or two
of those columns you can tell MySQL to index those columns.
When you do a search (or a sort) using an indexed column the
MySQL engine only has to process the much smaller index
instead of the entire table to find the right field. You can
also specify that an index is unique which is an even bigger
performance benefit because once the engine finds the value
it can stop because there can't be another one like it. You
can add an index to a table with the command: alter table
<table> add <index|unique> <index> (<column>[,column2...]).
Indexing is a must on large tables. The performance can be
horrible without them.
Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and
FULLTEXT) are stored in B-trees. Exceptions are that indexes
on spatial data types use R-trees, and that MEMORY tables
also support hash indexes.
MySQL uses indexes for these operations:

To find the rows matching a WHERE clause quickly.

To eliminate rows from consideration. If there is a


choice between multiple indexes, MySQL normally uses
the index that finds the smallest number of rows.
To retrieve rows from other tables when performing
joins.
To find the MIN() or MAX() value for a specific indexed
column key_col.
The index also can be used for LIKE comparisons if the
argument to LIKE is a constant string that does not
start with a wildcard character.

Sometimes MySQL does not use an index, even if one is


available. One circumstance under which this occurs is when
the optimizer estimates that using the index would require
MySQL to access a very large percentage of the rows in the
table. (In this case, a table scan is likely to be much
faster because it requires fewer seeks.) However, if such a
query uses LIMIT to retrieve only some of the rows, MySQL
uses an index anyway, because it can much more quickly find
the few rows to return in the result.
Storage
Data in MySQL is stored in files (or memory) using a
variety of different techniques. Each of these techniques
employs different storage mechanisms, indexing facilities,
locking levels and ultimately provides a range of different
functions and capabilities. By choosing a different
technique you can gain additional speed or functionality
benefits that will improve the overall functionality of your
application.
For example, if you work with a large amount of
temporary data, you may want to make use of the MEMORY
storage engine, which stores all of the table data in
memory. Alternatively, you may want a database that supports
transactions (to ensure data resilience).
Each of these different techniques and suites of
functionality within the MySQL system is referred to as a
storage engine (also known as a table type). By default,
MySQL comes with a number of different storage engines preconfigured and enabled in the MySQL server. You can select
the storage engine to use on a server, database and even
table basis, providing you with the maximum amount of
flexibility when it comes to choosing how your information
is stored, how it is indexed and what combination of

performance and functionality you want to use with your


data.
This flexibility to choose how your data is stored and
indexed is a major reason why MySQL is so popular; other
database systems, including most of the commercial options,
support
only
a
single
type
of
database
storage.
Unfortunately the 'one size fits all approach' in these
other solutions means that either you sacrifice performance
for functionality, or have to spend hours or even days
finely tuning your database. With MySQL, we can just change
the engine we are using.
Programmatically this is nothing special, it is normal
practice to divide a program into modules and layers. But it
is unique for a DBMS (Database Management System), because a
developer and even a DBA (Database Administrator) is
traditionally insulated from the physical storage methods
that the database server may employ. How the data is stored
really does not concern them, as the server just takes care
of everything. That being the case, a developer or DBA could
benefit from knowing a bit more about such things as it may
help them to optimize applications. This is an angle that
may be applied to many aspects of database servers, but in
this article we'll focus on the storage engines.
Why have storage engine
interrelated reasons:

layers?

There

are

number

of

Technology
evolves.
As
new
features
are
developed,
maintaining backward compatibility in the file format is not
always possible. Users, would need to run a conversion tool
when they upgrade, or even dump/import their entire dataset.
This is obviously very inconvenient. It would be much nicer
if users could upgrade their server (for bug-fixes and other
new features) without also having to migrate all their data.
This means that a single version of the server has to
support multiple file formats.
For server developers, changes in the data storage code may
require related changes elsewhere in the server, and like
with all new code there is always the possibility of
introducing bugs. This calls for abstraction: changes in the
underlying code, to a large extent, should not affect the
code at higher levels.

Different applications have different requirements with


regard to data storage, and some of these requirements may
even conflict. Think of a banking application that requires
highly secure transaction processing, versus traffic logging
on a website. Typically, there are differences in the number
and balance of selects and updates, as well as the need for
transactions and isolation levels. There are always tradeoffs, and choices need to be made. With only one mechanism
available, most applications would just have to do with a
solution that is probably not optimal for them. While
accepting that there is no single tool suitable for every
use, we think that there is something to be said for a
moderate "Swiss army knife" style approach. It would be nice
if a server can cater effectively to more than one type of
application.
Fundamentally, different storage media call for a different
approach. A hard disk has characteristics which differ
wildly from RAM, for instance. In a nutshell, a hard disk
can generally contain more data, but getting to it takes
longer. RAM is very fast, but there is a limited supply of
it. Some search algorithms are optimized for RAM, others are
optimized for disk-based storage. And did you know that a
Compact Flash card uses much more power when reading data?
That is an issue that definitely needs to be considered for
an
embedded
application.
Who
knows
what
other
new
technologies we will see in the future.
MySQL's storage engine architecture addresses all these
aspects, and not by accident. It was a deliberate design
choice by Michael "Monty" Widenius, MySQL AB's CTO.
Let us look at a simplified high-level diagram of the MySQL
server architecture:

The diagram shows four storage engines, each with different


characteristics:

MyISAM is a disk based storage engine. Aiming for very


low overhead, it does not support transactions.
InnoDB is also disk based, but offers versioned, fully
ACID transactional capabilities. InnoDB requires more
disk space than MyISAM to store its data, and this
increased overhead is compensated by more aggressive
use of memory caching, in order to attain high speeds.
Memory (formerly called "HEAP") is a storage engine
that utilizes only RAM. Special algorithms are used
that make optimal use of this environment. It is very
fast.
NDB, the MySQL Cluster Storage engine, connects to a
cluster of nodes, offering high availability through
redundancy, high performance through fragmentation
(partitioning) of data across multiple node groups, and
excellent scalability through the combination of these
two. NDB uses main-memory only, with logging to disk.

One of the things that differs per storage engine is the


locking and isolation mechanism, but most of the server
operates in the same way no matter what storage engine is
used: all the usual SQL commands are independent of the
storage engine. Naturally, the optimizer may need to make
different choices depending on the storage engine, but this
is all handled through a standardized interface (API) which
each storage engine supports.
So to a degree, the application does not need to know how
its data is stored. And it may not matter either, when the
demands are not very high. But for a larger dataset, or with
more
demanding
access
requirements,
it
does
become
increasingly important to make a conscious choice. And the
best news is that an application can use multiple storage
engines, as the selection can be made on a per-table basis.
Also, the server can convert tables between the different
formats using a simple ALTER TABLE command.

Default Storage Engine


If you use CREATE TABLE without specifying the ENGINE=...
option, the server will use the default. The default storage
engine is MyISAM. If you want to change the default to say

InnoDB, you can use the configuration directive --defaultstorage-engine=InnoDB.


Something to be aware of is that if you create a table
specifying an engine type that is not enabled, MySQL will
automatically fall back to the default. From MySQL 4.1, a
warning is issued.

Query Processing
The query processing steps:
1. Parser (builds tree)
2. Preprocessor (checks syntax, columns)
3. Optimizer (generates query execution plan)
o query transformation
o search for optimal execution plan
o plan is refined
4. Query sent to execution engine
A query has only a few pre-defined operations, which eases
the task of processing a query:
access methods (whether table scan or index)
where conditions
joins
union, group, etc
MySQL uses a left-deep linear plan for executing a query.
All of the tables fall into a single line. Many other
systems use the bushy plan, which is more tree-like.
Timour shows a large query with 5 or 6 WHERE conditions and
steps through the process of how the query is parsed.
In optimizing a SQL statement there is quite a bit of
analysis of the cost of a query. The cost is calculated by
looking at things like how many times the disk will need
accessed, the number of pages per table, the length of the
rows and keys and the data schema (key uniqueness etc).
Determining costs involves mathematical operations to
determine the cost using different methods. The type of
storage engine isn't considered in the cost.

MySQL 5.0 has greedy searching. It doesn't consider


everything, just gets enough information to find a good path
and then moves on.

User Interfaces for MySQL- EMS MySQL Manager

Full support of MySQL versions from 3.23 to 5.06


New state-of-the-art graphical user interface
Rapid
database
management
and
navigation
Simple
management of all MySQL objects
Advanced data manipulation tools
Powerful security management
Excellent visual and text tools for query building
Impressive data export and import capabilities
Easy-to-use wizards performing MySQL services

Applications
Applications

Billing Management

Compliance & Risk Management


Customer Relationship Management (CRM)
Demand Chain Management (DRM)
Education
Enterprise Content Management (ECM)
Enterprise Information Portal (EIP)
Enterprise Resources Planning (ERP)
Financials
Government
Healthcare
Human Resources Management (HRMS)
Inventory Management
Manufacturing
Messaging & Collaboration
Order Management
Payroll Management
Point of Sale (POS)
Project Management
Purchasing Management
Retail
Supply Chain Management (SCM)

MySQL Usage

You might also like