Professional Documents
Culture Documents
Indices are useful for many applications but come with some limitations.
Consider the following SQL statement: SELECT first_name FROM people
WHERE last_name = 'Smith';. To process this statement without an index the
database software must look at the last_name column on every row in the
table (this is known as a full table scan). With an index the database simply
follows the B-tree data structure until the Smith entry has been found; this is
much less computationally expensive than a full table scan.
B-Tree
In the B-Tree, the records are stored in the leaves. This is the location where
there is nothing more beyond it. The order of the tree determines the
maximum number of children per node. Depth refers to the number of
required disk access. A B-Tree can have up to millions and billions of records
although it is not all the time that leaves necessarily contain a record but
more than half certainly do.
When decision points on the tree, which are called nodes, are on the hard
disk instead of on random access memory (RAM), B-Tree is the preferred
technique as hard disks could work a thousand times slower compared to
RAM because processes on hard disks requires mechanical parts. On RAM,
processes are done purely in electronic media.
The nodes in a B-Tree can have a variable number of child nodes within a
range pre-defined by the system. When a data is inserted or removed from a
node, the number of child nodes also changes but the pre-defined ranged
should be maintained so internal nodes may either be split or joined.
B-trees do not need frequent re-balancing as both the upper and lower
bounds on the number of child nodes are typically fixed. As an example, a 2-
3 B-Tree implementation has internal nodes that can only have 2 or 3 child
nodes.
To keep the B-tree well balanced, all leaf nodes are required to be of the
same depth. The depth only increases very slowly and infrequently.
Searching in a B-Tree structure starts from the root and traversed from top
to bottom. Insertion is done by looking for a node where a new leaf or
element should be. If there is still room or the maximum legal number of
elements is not exceeded, insertion takes place. Otherwise, the leaf node
splits into another tow nodes. Deletion in a B-Tree has two strategies. The
first involves locating the item to be deleted and immediately doing the
action then restructuring the tree. The second involves doing a traversing
down the tree and laying out the restructure before deleting.
In a file system, a file may contain any number of B-Trees and each B-Tree
must have a unique name composed of any string of characters. Each B-Tree
names is saved in the file an item containing the number of the rood node of
the B-Tree. Searching, inserting and deleting through the B-Tree starts from
the root node.
Indexing:
In a nutshell a database index is an auxiliary data structure which allows for
faster retrieval of data stored in the database. They are keyed off of a
specific column so that queries like “Give me all people with a last name of
‘Smith’” are fast.
Types of indexes
There are five types of indexes: unique and non-unique indexes, and clustered and non-
clustered indexes, and system generated block indexes for multidimensional clustered (MDC)
tables .
When attempting to create a unique index for a table that already contains data, values in the
column or columns that comprise the index are checked for uniqueness; if the table contains
rows with duplicate key values, the index creation process fails. Once a unique index has been
defined for a table, uniqueness is enforced whenever keys are added or changed within the
index. (This includes insert, update, load, import, and set integrity, to name a few.) In addition to
enforcing the uniqueness of data values, a unique index can also be used to improve data
retrieval performance during query processing.
Non-unique indexes, on the other hand, are not used to enforce constraints on the tables with
which they are associated. Instead, non-unique indexes are used solely to improve query
performance by maintaining a sorted order of data values that are used frequently.
Both clustered and non-clustered indexes contain only keys and record IDs in the index
structure. The record IDs always point to rows in the data pages. The only difference between
clustered and non-clustered indexes is that the database manager attempts to keep the data in
the data pages in the same order as the corresponding keys appear in the index pages. Thus
the database manager will attempt to insert rows with similar keys onto the same pages. If the
table is reorganized, it will be inserted into the data pages in the order of the index keys.
Reorganizing a table with respect to a chosen index re-clusters the data. A clustered index is
most useful for columns that have range predicates because it allows better sequential access
of data in the table. This results in fewer page fetches, since like values are on the same data
page.
In general, only one of the indexes in a table can have a high degree of clustering.
However, clustering indexes cannot be specified as part of the table definition used with the
CREATE TABLE statement. Instead, clustering indexes are only created by executing the
CREATE INDEX statement with the CLUSTER option specified. Then the ALTER TABLE
statement should be used to add a primary key that corresponds to the clustering index created
to the table. This clustering index will then be used as the table's primary key index.
Note: Setting PCTFREE in the table to an appropriate value using the ALTER TABLE statement
can help the table remain clustered by leaving adequate free space to insert rows in the pages
with similar values. For more information, see the ALTER TABLE statement and Reducing the
need to reorganize tables and indexes.
Therefore, if a unique index consists of a single column, only one null value is allowed–more
than one null value would violate the unique constraint. Similarly, if a unique index consists of
multiple columns, a specific combination of values and nulls can be used only once.
Bi-directional indexes
By default, bi-directional indexes allow scans in both the forward and reverse directions. The
ALLOW REVERSE SCANS clause of the CREATE INDEX statement enables both forward and
reverse index scans, that is, in the order defined at index creation time and in the opposite (or
reverse) order. This option allows you to:
• Facilitate MIN and MAX functions
• Fetch previous keys
• Eliminate the need for the database manager to create a temporary table for the reverse
scan
• Eliminate redundant reverse order indexes
If DISALLOW REVERSE SCANS is specified then the index cannot be scanned in reverse
order. (But physically it will be exactly the same as an ALLOW REVERSE SCANS index.)