You are on page 1of 6

Binary Trees

A binary search tree (BST) is a node based binary tree data structure which has the following properties;

The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.

From the above properties it naturally follows that:

Each node (item in the tree) has a distinct key.

Below is a binary tree, that will assist use in defining terms related to them.

The root node (or simply root) is the node at the top of the tree
diagram(110).

The parent of a node is the one directly connected to it from above. In our
example, 111 is the parent of 350, 230 is the parent of 310, 110 has no
parent, etc.

A child is a node connected directly below the starting node. Thus 350 is a
child of 111, 310 is a child of 230, etc. It is simply the reverse of the parent
relationship. Nodes with the same parent are called siblings. So, 221, 230,
and 350 are siblings in our example.

An ancestor of a given node is either the parent, the parent of the parent,
the parent of that, etc. In our example 110 is an ancestor of all of the other nodes in the tree. The counterpart of
ancestor is descendant. For example, 310 is a descendant of 111, but 310 is not a descendant of 221.

The leaves of a tree (sometimes also called external nodes) are those nodes with no children. In our example, 221,
350, 330, and 310 are leaves. Note that leaves do not have to be at the lowest level in the tree. The other nodes of the
tree are called non-leaves (or sometimes internal nodes).

A branch is a sequence of nodes such that the first is the parent of the second, the second is the parent of the third,
etc. For example, in the above tree, the sequence 111, 230, 310 is a branch. The length of a branch is the number of
line segments traversed (which is one less than the number of nodes). The above branch has length 2.

The height of a tree is the maximum length of a branch from the root to a leaf. The above tree has height 3, since the
longest possible branch from root to leaf is either 110, 111, 230, 310 or 110, 111, 230, 330, both of which have length
3.

A more formal definition of a tree can be written as follows: A (general) tree consists of a set of nodes that is either
empty or has a root node to which are attached zero or more subtrees. Of course, a subtree itself must be a tree. Thus
this is a recursive definition. Note, too, that there is no left to right ordering of the subtrees.

What is a Binary Tree?

A binary tree is similar to a tree, but not quite the same. For a binary tree, each node can have zero, one, or two
children. In addition, each child node is clearly identified as either the left child or the right child. Thus there is a
definite left-right ordering of the child nodes. Even when a node has only one child, that child must be identified as
either the left child or the right child. As an example, here is the binary expression tree for the expression 4 * 5 - 3.
Note that a child drawn to the left of its parent is meant to be the left child and that a child drawn to the right of its
parent is meant to be the right child.

What is a Binary Search Tree?

A binary search tree is a binary tree in which the data in the nodes is ordered in a particular way. To be precise,
starting at any given node, the data in any nodes of its left subtree must all be less than the item in the given node,
and the data in any nodes of its right subtree must be greater than or equal to the data in the given node. Of course,
all of this implies that the data items can be ordered by some sort of less than relationship. For numbers this can
obviously be done. For strings, alphabetical ordering is often used. For records of data, a comparison based on a
particular field (the key field) is often used.

The following is a binary search tree where each node contains a person's name. Only first names are used in order to
keep the example simple. Note that the names are ordered alphabetically so that DAVE comes before DAWN, DAVID
comes before DAWN but after DAVE, etc.

How does one create a binary search tree in the first place? One way to do so is by starting with an empty binary
search tree and adding the data items one by one. The first item becomes the root. The next item is placed in either a
left child or right child node, depending on the ordering. The third item is compared to the root, we go left or right
depending on the result of the comparison, etc. until we find the spot for it. In each case we follow a path from the
root to the spot where we insert the new item, comparing the new item to each item in the nodes encountered along
the way, going left or right at each node so as to maintain the ordering prescribed above.

Traversals of Binary Trees

There are several well-known ways to traverse, to travel throughout, a binary tree. We will look at three of them. The
first is an inorder traversal. This consists of three overall steps: traverse the left subtree
(recursively), visit the root node, and traverse the right subtree (recursively). When we
"visit" a node we typically do some processing on it, such as printing out the contents of
the node. For example, an inorder traversal of the above binary search tree gives us the
names in this order:

BETH
CINDI
DAVE
DAVID
DAWN
GINA
MIKE
PAT
SUE

Note that we got the data back in ascending order. This will always happen when doing an inorder traversal of a binary
search tree. In fact, a sort can be done this way. One first inserts the data into a binary search tree and then does an
inorder traversal to obtain the data in ascending order. Some people call this a tree sort.

Now, how exactly did we get the above list of data? Essentially, we did so by following the recursive definition for an
inorder traversal. First we traverse the left subtree of the root, DAWN. That left subtree is the one rooted at DAVE.
How do we traverse it? By using the same three-step process. We first traverse its left subtree, the one rooted at
BETH. Of course, we then have to go through the three steps on the subtree rooted at BETH. We begin by traversing
its left subtree, but it is empty, so we visit the root, BETH. That is the first data item printed. Then we traverse the right
subtree, the one rooted at CINDI. We use the three-step process on it, but since its subtrees are empty, we simply
print the root, CINDI, which is the second item printed. We then back up to where we left off with the subtree rooted
at DAVE. We have now traversed its left subtree, so we go on to print the root, DAVE, and then traverse the right
subtree. Since the right subtree itself has empty subtrees, we end up just printing its root, DAVID. We continue in a
similar fashion for the rest of this binary search tree.

The other two traversals that we will study are the preorder traversal and the postorder traversal. They are very
similar to the inorder traversal in that they consist of the same three steps, but reordered slightly. The preorder
traversal puts the step of visiting the root first. The postorder traversal puts the step of visiting the root last.
Everything else stays the same. Here is an outline of the steps for all three of our traversals.
Preorder traversal
1. Visit the root
2. Traverse the left subtree
3. Traverse the right subtree
Inorder traversal
1. Traverse the left subtree
2. Visit the root
3. Traverse the right subtree
Postorder traversal
1. Traverse the left subtree
2. Traverse the right subtree

3. Visit the root


As an example, let's do a postorder traversal of the binary expression tree for 4 * 5 - 3 that we looked at earlier. The
tree is shown again for convenience:
First we traverse the left subtree of the whole binary tree. This is the subtree rooted at *. To do so, we apply our three
steps. We traverse its left subtree, which results in printing 4. Then we traverse the right subtree, which results in
printing 5. Then we visit the root, printing *. Next, we back up to where we left off with the whole binary tree. We
have now traversed the left subtree, so we traverse the right subtree, printing 3. Then we visit the root, printing -.
Overall we end up printing 4 5 * 3 -, the postfix form of the expression. A postfix expression is deciphered by looking at
it left to right and using the fact that each operator (such as *) applies to the two previous values. Note that the
traversal always works like this: a postorder traversal of a binary expression tree yields the postfix form of the
expression. You may be familiar with postfix expressions in that some calculators use them. In ordinary mathematics
we are used to using infix expressions, where operators such as + and * come between the two values to which they
apply.

For practice try a preorder traversal of the same binary expression tree for 4 * 5 - 3. The result should be - * 4 5 3. This
is the prefix form of the expression, that is, the form in which each binary operator precedes the two quantities to
which it applies. A preorder traversal of a binary expression tree always gives the prefix form of the expression.

The natural conjecture, then, would be that the inorder traversal of a binary expression tree would produce the infix
form of the expression, but that is not quite true. With the above expression it is true. However, try the infix
expression (12 - 3) * 2. Here parentheses are used to indicate that the subtraction should be done before the
multiplication. The binary expression tree looks like this:

As you can verify, an inorder traversal of this binary expression tree produces 12 - 3 * 2, which is the infix form of a
slightly different expression, one in which the multiplication is done before the subtraction. The problem is that we did
not get the parentheses back. It is possible to modify the code for an inorder traversal so that it always parenthesizes
things, but a plain inorder traversal does not give any parentheses.

Uses of a Binary Search Tree

A binary search tree can be a very useful data structure. We have already seen that it can be used to create a sort
routine. Such a sort routine is normally pretty fast. In fact, it is Theta(n * lg(n)) on the average. However, it does have a
bad worst case, namely when the data is already in ascending or descending order. (Try it. Start with a list of data
items in order and insert them one by one into a binary search tree. What does this tree look like? Why would this
make it slow to access the data later when we do the inorder traversal?)

Another use of a binary search tree is in storing data items for fast lookup later. In the average case it is pretty fast to
insert a new item into a binary tree, because in the average case the data is fairly random and the binary tree is
reasonably "bushy". (In such a tree it is known that the height of the binary tree is Theta(lg(n)), so that insertion is a
Theta(lg(n)) operation.) Similarly, doing a lookup of an item already in the binary tree follows the same pattern as used
when it was inserted. Thus lookup is Theta(lg(n)) on average.

For example, to look up GINA in the binary tree above, one compares GINA to DAWN, the root. Since GINA is larger,
move to the right child, MIKE. Now compare GINA to MIKE. Since GINA is smaller, move to the left child GINA. Now
compare GINA to the item in the node, also GINA, and we see that we have a match. All lookups are like this. One
starts at the root and follows a path from the root to the matching item (or to a leaf if no match is ever found).

Expression Trees

Another application of trees is to store mathematical expressions such as 15*(x+y) or sqrt(42)+7 in a convenient form.
Let's stick for the moment to expressions made up of numbers and the operators +, -, *, and /. Consider the
expression 3*((7+1)/4)+(17-5). This expression is made up of two subexpressions, 3*((7+1)/4) and (17-5), combined
with the operator "+". When the expression is represented as a binary tree, the root node holds the operator +, while
the subtrees of the root node represent the subexpressions 3*((7+1)/4) and (17-5). Every node in the tree holds either
a number or an operator. A node that holds a number is a leaf node of the tree. A node that holds an operator has two
subtrees representing the operands to which the operator applies. The tree is shown in the illustration below. I will
refer to a tree of this type as an expression tree.

Given an expression tree, it's easy to find the value of the expression that it represents. Each node in the tree has an
associated value. If the node is a leaf node, then its value is simply the number that the node contains. If the node
contains an operator, then the associated value is computed by first finding the values of its child nodes and then
applying the operator to those values. The red arrows in the illustration show the process. The value computed for the
root node is the value of the expression as a whole. There are other uses for expression trees. For example, a
postorder traversal of the tree will output the postfix form of the expression.
An expression tree contains two types of nodes: nodes that contain numbers and nodes that
contain operators. Furthermore, we might want to add other types of nodes to make the trees
more useful, such as nodes that contain variables. If we want to work with expression trees in C++,
how can we deal with this variety of nodes? One way -- which will be frowned upon by object-
oriented purists -- is to include an instance variable in each node object to record which type of
node it is:

You might also like