You are on page 1of 4

b- tree queries indexing

Indices are useful for many applications but come with some limitations.
Consider the following SQL statement: SELECT first_name FROM people
WHERE last_name = 'Smith';. To process this statement without an index the
database software must look at the last_name column on every row in the
table (this is known as a full table scan). With an index the database simply
follows the B-tree data structure until the Smith entry has been found; this is
much less computationally expensive than a full table scan.

Consider this SQL statement: SELECT email_address FROM customers


WHERE email_address LIKE '%@yahoo.com';. This query would yield an email
address for every customer whose email address ends with "@yahoo.com",
but even if the email_address column has been indexed the database still
must perform a full table scan. This is because the index is built with the
assumption that words go from left to right. With a wildcard at the beginning
of the search-term, the database software is unable to use the underlying b-
tree data structure (in other words, the WHERE-clause is not sargable). This
problem can be solved through the addition of another index created on
reverse(email_address) and a SQL query like this: SELECT email_address
FROM customers WHERE reverse(email_address) LIKE
reverse('%@yahoo.com');. This puts the wild-card at the right-most part of
the query (now moc.oohay@%) which the index on reverse(email_address)
can satisfy.

B-Tree

B-Tree is an indexing technique most commonly used in databases and file


systems where pointers to data are placed in a balance tree structure so that
all references to any data can be accessed in an equal time frame. It is also a
tree data structure which keeps data sorted so that searching, inserting and
deleting can be done in logarithmic amortized time.

The B-Tree belongs to a group of techniques in computer science known as


self-balancing search trees which attempts to automatically keep the
number of levels of nodes under the root small at all times. It is the most
preferred way to implement sets, associative arrays and other data
structures that are used in computer programming languages, relational
database management systems and low level data manipulations.

In the B-Tree, the records are stored in the leaves. This is the location where
there is nothing more beyond it. The order of the tree determines the
maximum number of children per node. Depth refers to the number of
required disk access. A B-Tree can have up to millions and billions of records
although it is not all the time that leaves necessarily contain a record but
more than half certainly do.

When decision points on the tree, which are called nodes, are on the hard
disk instead of on random access memory (RAM), B-Tree is the preferred
technique as hard disks could work a thousand times slower compared to
RAM because processes on hard disks requires mechanical parts. On RAM,
processes are done purely in electronic media.

The nodes in a B-Tree can have a variable number of child nodes within a
range pre-defined by the system. When a data is inserted or removed from a
node, the number of child nodes also changes but the pre-defined ranged
should be maintained so internal nodes may either be split or joined.

B-trees do not need frequent re-balancing as both the upper and lower
bounds on the number of child nodes are typically fixed. As an example, a 2-
3 B-Tree implementation has internal nodes that can only have 2 or 3 child
nodes.

To keep the B-tree well balanced, all leaf nodes are required to be of the
same depth. The depth only increases very slowly and infrequently.

Searching in a B-Tree structure starts from the root and traversed from top
to bottom. Insertion is done by looking for a node where a new leaf or
element should be. If there is still room or the maximum legal number of
elements is not exceeded, insertion takes place. Otherwise, the leaf node
splits into another tow nodes. Deletion in a B-Tree has two strategies. The
first involves locating the item to be deleted and immediately doing the
action then restructuring the tree. The second involves doing a traversing
down the tree and laying out the restructure before deleting.

In a file system, a file may contain any number of B-Trees and each B-Tree
must have a unique name composed of any string of characters. Each B-Tree
names is saved in the file an item containing the number of the rood node of
the B-Tree. Searching, inserting and deleting through the B-Tree starts from
the root node.

Indexing:
In a nutshell a database index is an auxiliary data structure which allows for
faster retrieval of data stored in the database. They are keyed off of a
specific column so that queries like “Give me all people with a last name of
‘Smith’” are fast.
Types of indexes
There are five types of indexes: unique and non-unique indexes, and clustered and non-
clustered indexes, and system generated block indexes for multidimensional clustered (MDC)
tables .

Unique and non-unique indexes


Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of
data in a table have identical key values.

When attempting to create a unique index for a table that already contains data, values in the
column or columns that comprise the index are checked for uniqueness; if the table contains
rows with duplicate key values, the index creation process fails. Once a unique index has been
defined for a table, uniqueness is enforced whenever keys are added or changed within the
index. (This includes insert, update, load, import, and set integrity, to name a few.) In addition to
enforcing the uniqueness of data values, a unique index can also be used to improve data
retrieval performance during query processing.

Non-unique indexes, on the other hand, are not used to enforce constraints on the tables with
which they are associated. Instead, non-unique indexes are used solely to improve query
performance by maintaining a sorted order of data values that are used frequently.

Clustered and non-clustered indexes


Index architectures are classified as clustered or non-clustered. Clustered indexes are indexes
whose order of the rows in the data pages correspond to the order of the rows in the index. This
is why only one clustered index can exist in a given table, whereas, many non-clustered indexes
can exist in the table. In some relational database management systems, the leaf node of the
clustered index corresponds to the actual data, not a pointer to data that resides elsewhere.

Both clustered and non-clustered indexes contain only keys and record IDs in the index
structure. The record IDs always point to rows in the data pages. The only difference between
clustered and non-clustered indexes is that the database manager attempts to keep the data in
the data pages in the same order as the corresponding keys appear in the index pages. Thus
the database manager will attempt to insert rows with similar keys onto the same pages. If the
table is reorganized, it will be inserted into the data pages in the order of the index keys.

Reorganizing a table with respect to a chosen index re-clusters the data. A clustered index is
most useful for columns that have range predicates because it allows better sequential access
of data in the table. This results in fewer page fetches, since like values are on the same data
page.

In general, only one of the indexes in a table can have a high degree of clustering.

Improving performance with clustering indexes


Clustering indexes can improve the performance of most query operations because they
provide a more linear access path to data, which has been stored in pages. In addition, because
rows with similar index key values are stored together, prefetching is usually more efficient
when clustering indexes are used.

However, clustering indexes cannot be specified as part of the table definition used with the
CREATE TABLE statement. Instead, clustering indexes are only created by executing the
CREATE INDEX statement with the CLUSTER option specified. Then the ALTER TABLE
statement should be used to add a primary key that corresponds to the clustering index created
to the table. This clustering index will then be used as the table's primary key index.
Note: Setting PCTFREE in the table to an appropriate value using the ALTER TABLE statement
can help the table remain clustered by leaving adequate free space to insert rows in the pages
with similar values. For more information, see the ALTER TABLE statement and Reducing the
need to reorganize tables and indexes.

Generally, clustering is more effectively maintained if the clustering index is unique.

Differences between primary key or unique key constraints and unique


indexes
It is important to understand that there is no significant difference between a primary unique key
constraint and a unique index. The database manager uses a combination of a unique index
and the NOT NULL constraint to implement the relational database concept of primary and
unique key constraints. Therefore, unique indexes do not enforce primary key constraints by
themselves because they allow null values. (Although null values represent unknown values,
when it comes to indexing, a null value is treated as being equal to other null values.)

Therefore, if a unique index consists of a single column, only one null value is allowed–more
than one null value would violate the unique constraint. Similarly, if a unique index consists of
multiple columns, a specific combination of values and nulls can be used only once.

Bi-directional indexes
By default, bi-directional indexes allow scans in both the forward and reverse directions. The
ALLOW REVERSE SCANS clause of the CREATE INDEX statement enables both forward and
reverse index scans, that is, in the order defined at index creation time and in the opposite (or
reverse) order. This option allows you to:
• Facilitate MIN and MAX functions
• Fetch previous keys
• Eliminate the need for the database manager to create a temporary table for the reverse
scan
• Eliminate redundant reverse order indexes
If DISALLOW REVERSE SCANS is specified then the index cannot be scanned in reverse
order. (But physically it will be exactly the same as an ALLOW REVERSE SCANS index.)

* 1 Bitmap Index ( used only on Low cardinality columns)


2. Btree Index ( used only for High Cardinality of columns)

You might also like