You are on page 1of 70

Binary Tree Structures

: Algorithms for Heap


Heap Structures
Demand : To keep inserting the elements into a list
At the same time keep pointing to the largest /
smallest element

Queue : Is good for insertion,


but locating the largest concurrently can be
cumbersome

Requirement : Is to device a sort of PRIORITY


QUEUE structure
 HEAP is the required structure
Definition :
A Heap is a complete binary tree with the property
that the value at each node is at least as large as the values
at its children ( if they exist)
This implies that the largest is at the root of the heap
if the elements are distinct
If the logic is reversed we get the smallest value in
the root 60

80 60
50 40

60 40 80 40 45 42 32 35
A Heap Not a Heap A Heap
Insertion into a HEAP :

One adds a new item at the bottom of the heap


Let us consider a heap of six elements
1 80

2 45 3 70

4 40 5 35 6 50

Let us try to insert the seventh element, say 90


STEP 1 : The element is inserted at the bottom most
location as the latest child

1 80

2 45 3 70

4 40 5 35 6 50 7 90
STEP 2 : Check whether the insertion of the content in this
child location creates a HEAP

Latest child  7 3 70
Corresponding parent  7/2 = 3
Corresponding local heap structure
6 50 7 90
Ch1  2 * Parent 6
Ch2  Ch1 + 1 7
This does not make a HEAP
3 90
Hence the content of parent_node (3)
should be swapped with the content of
latest_child_node (7) 6 50 7 70
This implies the new content is moved to node 3

Hence newly identified child node is 3


STEP 3 :
Therefore it is now required to check whether the
movement of the content into the node makes a HEAP
treating the node as a child node
Child node  3 1 80
Corresponding Parent  3/2 = 1
Ch1  2
2 45 3 90
Ch2  3
This does not result in a HEAP
Hence the content of the parent_node (1) should be
swapped with the content of the newly identified
child_node (3)
1 80 1 90

2 45 3 90 2 45 3 80

This implies that the new content is moved to node 1


Hence newly identified child node  1
STEP 4 :
Therefore it is now required to check whether the
movement of the content into the node makes a HEAP
treating the node as a child node
Child node  1 1 90

Corresponding Parent  1/2 = 0


Ch1  x 2 45 3 80

Ch2  x
This is INVALID 4 40 5 35 6 50 7 70

INSERTION PROCESS TERMINATES


Algorithm : 1 90

2 45 3 80

4 40 5 35 6 50 7 70
:
: Latest Location
.......
: N
ITEM
N-1
{
latest_location  N
A(latest_location)  ITEM
new_child  latest_location
Back : corresponding_ parent  new_child/2
if corresponding_parent < 1 then { Insertion is Completed
Terminate }
else
if( A(corresponding_parent) < ITEM
then { A(new_child)  A(corresponding_parent}
A(corresponding_parent)  ITEM
new_child  corresponding_parent
go Back }
else { ITEM is settled
Terminate } }
1 90

2 45 3 80

4 40 5 35 6 50 7 70

Best Case Insertion


90 1 90
1

2 45 3 80 2 45 3 98

4 40 5 35 6 50 7 98 4 40 5 35 6 50 7 80

1 98

Worst Case
Insertion 2 45 3 90

4 40 5 35 6 50 7 80
1 90 1 90

2 45 3 80 2 45 3 85

4 40 5 35 6 50 7 85 4 40 5 35 6 50 7 80

One of the Average Case Insertion


Time Complexity Analysis

Best Case Time Complexity :


When the new element gets settled in the latest
location.

This does not induce any swapping or any movement

  (1)
Worst Case Time Complexity :
The latest location will be in the most exterior level
say (l)

Then the element moves from


level l  level(l-1)  level(l-2) . . .  level 3  level 2
 level 1  level 0

Therefore Number of movements  (l)


Recall :
t
O(log 2 (N )

l = log 2 (N + 1) – 1 (1)
l  log 2 (N )  O(log 2 (N ) ) n
Where N is the total number of vertices
i.e. N indicates the position of the latest location

Therefore Number of Movements  log 2 (N )

 O(log 2 (N )
How to Profile for Average Computing Time ?
Create a heap of (n-1) elements

Let the heap elements be in the range [ a , b]

Let the new element to be inserted be < b


Then this gives the best computing time (t min)

Let the new element to be inserted be > a


Then this gives the worst computing time (t max)
Let the element to be inserted be in the range (a , b).
Then the computing time t is t max < t <t min
Repeat the process of randomly creating the new element
m times
And correspondingly perform insertion process m times

Let the corresponding computing time measures be


t1, t2, . . . , tm

Therefore , t avg =  tj / m where j = 1 , 2, . . . , m

Increase the size of the heap i.e. Create a heap of n – 1


elements where n is larger than the previous value of n

PLOT THE GRAPH OF TIME Vs n


time

x x x t max
x x x x x x x x
x 0 0
t avg 0 0
x 0 0 0 0
0 0 0 0

o o o o o o o o o o o o o o t min

n
Draw smooth curves to pass through the corresponding
points
Find an expression for t avg - the best fit expression
Heap Creation by Heap Insertion
for latest location  1 to n
{
Perform Heap Insertion }
{
latest_location  N
A(latest_location)  ITEM
new_child  latest_location
Back : corresponding_ parent  new_child/2
if corresponding_parent < 1 then { Insertion is Completed
Terminate }
else
if( A(corresponding_parent) < ITEM
then { A(new_child)  A(corresponding_parent}
A(corresponding_parent)  ITEM
new_child  corresponding_parent
go Back }
else { ITEM is settled
Terminate } }
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 1

1 40 1 40

Latest location = 2

1 40 1 80

2 80 2 40
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 1
1 40 1 40

Latest location = 2

1 40 1 80

2 80 2 40

Latest location = 3
1 80 1 80

2 40 35 3 2 40 35 3
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 3
1 80 1 80

2 40 35 3 2 40 35 3

Latest location = 4
1 80 1 80 1 90

2 40 35 3 2 90 35 3 2 80 35 3

4 90 4 40 4 40
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 4
1 80 1 80 1 90

2 40 35 3 2 90 35 3 2 80 35 3

4 90 4 40 4 40
Latest location = 5
1 90 1 90

2 80 35 3 2 80 35 3

4 40 45 5 4 40 45 5
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 5
1 90 1 90

2 80 35 3 2 80 35 3

4 40 45 5 4 40 45 5
Latest location = 6
90
1 1 1 90
1

80 2 35 3 80 2 50 3
2 2

4 40 45 5 6 50 4 40 45 5 6 35
Illustration : Consider the data set {40, 80, 35, 90, 45, 50,70}
Latest location = 6
90
1 1 90
1
1
80 2 35 3 80 2 50 3
2 2

4 40 45 5 6 50 4 40 45 5 6 35

Latest location = 7
1 90
1 1 90
1

80 2 50 3 80 2 70 3
2 2

4 40 45 5 6 35 70 7 4 40 45 5 6 35 50 7
TIME COMPLEXITY ANALYSIS

Best Case
As and when a new element is inserted, it gets settled in the
latest child location.
n elements are inserted

Therefore
t best  n   (n)
TIME COMPLEXITY ANALYSIS

Worst Case
Let the level number be j where insertions are currently
being made.
There are 2j locations where elements are inserted
Let us assume in each case the inserted element moves up
to the root i.e. level 0

Therefore
t to insert each element  j
 t for all elements inserted in jth level  j * 2j
TIME COMPLEXITY ANALYSIS

Let us assume that n elements make a Heap of ‘l’ levels

Then t worst   j * 2j for j = 0, 1, 2, . . . , l

t worst  ( 0 * 20 + 1 * 21 + 2 * 2 2 . . . + l * 2l
We know the identity

 i* 2i (for i = 1, 2, 3, . . ., n) = (n –1) 2 n+1 + 2


We also know that l  log2n

Therefore t worst  l * 2l  log2n * 2 log n 2

 nlog n  O(nlog n)
TIME COMPLEXITY ANALYSIS

O(nlog2n)
t
(n)

Is t avg nearer to n or nlog2n ?

WHY ?
Profile :

1. Set a value for n


2. Generate n random numbers between (a,b)
3. Create a heap and note the time t1
4. Repeat step number 2 and 3 with different sets of n
random numbers between (a,b) and compute
t avg =  tj / m for j = 1, 2,3, . . . m
5. Repeat the above procedure for increasing values of n
6. Draw the profile graph
Some Important Identities
 i* 2i (for i = 1, 2, 3, . . ., n) = (n –1) 2 n+1 + 2
1 * 21 = 2 = (1-1) 21+1 + 2
1 * 21 + 2 * 22 = 2 + 8 = 10 = (2-1)22+1 + 2 = 1 . 23 + 2

 2i (for i = 1, 2, 3, . . ., n) = 2 n+1 - 2
21 + 22 = 2 + 4 = 6 = 23 - 2
21 + 22 + 23 = 14 = 24 - 2

2 log n
2
=n
Hint : Take log on both sides (to base 2)
Heap Creation by Heap Shake / Heap Adjust

Number of elements that make a Heap should be known


before hand
Step 1 : Create a complete Binary Tree
Consider an instance : 100 119 118 171 112 151 132

1 100
1

119 2 118 3
2

4 171 112 5 6 151 132 7

Certainly !! This is not a Heap


Step 2 : Starting from the youngest parent_node, start
adjusting the HEAP
1 100
1 p=3
Ch1 = 6
2 1192 118 3
Ch2 = 7

4 171 112 5 6 151 132 7

1 100
1

2 1192 151 3

4 171 112 5 6 118 132 7


1 100
1

2 1192 151 3

4 171 112 5 6 118 132 7


1 100
1

1192 151
p=p–1=2
2 3
Ch1 = 4

4 171 112 5 6 118 132 7 Ch2 = 5

1 100
1

2 1712 151 3

4 119 112 5 6 118 132 7


1 100
1

2 1712 151 3

4 119 112 5 6 118 132 7


1 100
1
p=p–1=1
2 1712 151 3
Ch1 = 2
Ch2 = 3
4 119 112 5 6 118 132 7

1 171
1

The disturbed node is 2. 1002 151


2 3
Hence the Heap with node
2 as new parent should
get stabilized 4 119 112 5 6 118 132 7
1 171
1 New Parent because of
disturbance = 2
2 1002 151 3 Ch1 = 4
Ch2 = 5
4 119 112 5 6 118 132 7

1 171
1

The disturbed node is 4. 2 1192 151 3


Hence the Heap with node
4 as new parent should 112 5 6 118
4 100 132 7
get stabilized
1 171
1
New Parent because of
2 1192 151 3 disturbance = 4
Ch1 = 8

4 100 112 5 6 118 132 7 Ch2 = 9

In this binary tree NODE 4 cannot play the role of parent


node. Hence the movement downwards gets completed.

Now, We should go to the earlier step of going to next higher parent


using p = p -1

Recall all this downward movement started when p = 1.


Therefore p = p –1 = 1 – 1 = 0
This implies that the adjusting procedure has covered all
possible parent nodes.
HENCE HEAP IS FORMED

1 171
1

2 119 2 151 3

4 100 112 5 6 118 132 7


Let us Devise the Algorithm
Let us Devise the Algorithm

Youngest Child or
Latest Child = n
Let us Devise the Algorithm

Youngest Child or
Latest Child = n

Corresponding
Youngest Parent =
Pa  n / 2
During Hand Simulation we have seen that we need two
control pointers namely
 Parent Counter - Let p indicate Parent Counter
Initially p  Pa
Subsequently p  p – 1 till all parents are
exhausted
 Disturbed New Parent Counter – Let P indicate this
To begin with p  Pa
Pp
 If there is a movement, new child node into which P ‘s
content moves is assigned = P
ALGORITHM for HEAP - ADJUST
p  Pa
Back2 : P  p
Back1 : If P > Pa then { p  p – 1, if p < 1 then heap completes
else go Back2 }
else
{ Ch1  2 * P ; Ch2  Ch1 + 1}
If (A(Ch1) > A(Ch2) then { X  A(Ch1) ; L  Ch1}
else { X  A(Ch2) ; L  Ch2};
If (X >A(P) then { A(L)  A(P) ; A(P)  X;
P  L; go Back1;}
else {p  p – 1, if p < 1 then heap completes
else go Back2 }
ANALYSIS
Oth Level

ith Level
Due to
Movement Depth = l - i
PL

lth Level
Total number of nodes in ith level is 2i

Maximum movements for each element is (l – i)


Therefore total number of worst case movements = (l – i) 2i

This is true for all parents @ i = l – 1, l –2, . . . , 2 , 1, 0


Therefore

t  (l – (l-1))2l-1 + (l – (l – 2)) 2l-2. . .+ (l-2) 22 + ( l –1)21 + l.20

t  l.20 + ( l –1)21 + (l-2) 22 . . . +(l – (l – 2)) 2l-2 +(l – (l-1))2l-1


t  l + ( l –1)21 + (l-2) 22 . . . +(l – (l – 2)) 2l-2 +2l-1
t  [ l + l * 21 + l * 22 . . . +l * 2l-2 ] - [1 * 21 + 2 * 22 + 3 * 23
. . . (l – 2) * 2l-2 ] + 2l-1
Therefore

t  (l – (l-1))2l-1 + (l – (l – 2)) 2l-2. . .+ (l-2) 22 + ( l –1)21 + l.20

t  l.20 + ( l –1)21 + (l-2) 22 . . . +(l – (l – 2)) 2l-2 +(l – (l-1))2l-1


t  l + ( l –1)21 + (l-2) 22 . . . +(l – (l – 2)) 2l-2 +2l-1
t  [ l + l * 21 + l * 22 . . . +l * 2l-2 ] - [1 * 21 + 2 * 22 + 3 * 23
. . . (l – 2) * 2l-2 ] + 2l-1
t  l [20 + 21 + 22 . . . +2l-2 ] - [1 * 21 + 2 * 22 + 3 * 23
. . . (l – 2) * 2l-2 ] + 2l-1
We Know that
 i* 2i (for i = 1, 2, 3, . . ., n) = (n –1) 2 n+1 + 2

 2i (for i = 1, 2, 3, . . ., n) = 2 n+1 - 2
t  l [20 + 21 + 22 . . . +2l-2 ] - [1 * 21 + 2 * 22 + 3 * 23
. . . (l – 2) * 2l-2 ] + 2l-1
We Know that
 i* 2i (for i = 1, 2, 3, . . ., n) = (n –1) 2 n+1 + 2

 2i (for i = 1, 2, 3, . . ., n) = 2 n+1 - 2

t  l [20 + 2l-1 - 2 ] - [ (l-3) * 2l-1 + 2 ] + 2l-1

t  l + l * 2l-1 - 2*l – l * 2l-1 + 3 * 2l-1 - 2 + 2l-1

t  - l + 4 * 2l-1 - 2
t  - l + 22 * 2l-1 - 2  - l + 2l-1 + 2 - 2  - l + 2l+ 1 - 2

t  2l+ 1  t  2 log 2n
 t  n  O(n)
1. Profile Heap adjust Algorithm
2. Devise an algorithm to recognize whether a complete binary tree
is a HEAP or NOT. Is it a  - O algorithm or  Complex
algorithm. Can there be an alternate way to develop this
algorithm ?
3. Redraft Heap adjust algorithm starting from the eldest parent
node instead of starting from youngest parent node
4. What is  of Heap adjust algorithm
5. Compare and contrast Heap creation by insertion and Heap
creation by adjustment
1 171
1
6. Given a Heap devise an algorithm to locate
1. First Largest Element
2 1192 151 3
2. Second Largest Element
3. Third Largest Element
Derive Complexity Measures 112 5 6 118
4 100 132 7
HEAP SORT
Background :
We know the largest exist in the Heap apex

It is analogous to telling that the largest of a set of n


elements is bubbled up to Heap apex

If the largest is removed and then Heap is remade


with remaining n-1 elements then the largest of the n-1
elements bubbles up as Heap apex
Hence the skeleton algorithm can be as follows to sort a set
of n elements : HEAP SORT

Step 1 : Make a Heap of n elements

Step 2 : Swap the content s of first node (root/apex) with the


content of last node (youngest child)

Step 3 : Cut the last node

Step 4 : Re-define last = last - 1

Step 5 : Remake the Heap of last no of elements

Step 6 : Repeat step 2 to 5 till last becomes zero


Illustration :

1 171
1
Consider a Heap of 7 elements
2 1192 151 3

100 112 118 132 7


4 5 6
Illustration :

1 171
1
Consider a Heap of 7 elements
2 1192 151 3 LAST = 7

100 112 118 132


4 56 7
Illustration :

1 171
1
Consider a Heap of 7 elements
2 1192 151 3 LAST = 7

LARGEST = 171 in Heap apex


100 112 118 132
4 5 7
6
Illustration :

1 171
1
Consider a Heap of 7 elements
2 1192 151 3 LAST = 7

LARGEST = 171 in Heap apex


100 112 118 132 1 132
1
4 5 7
6

2 1192 151 3
Swap the contents
of last and apex
100 112 118 171
4 5 7
6
Illustration :

1 132
1

2 1192 151 3

100 112 118 171


4 5 7
6

Cut the last


Redefine last = last – 1 = 6
1 132
1
Reconstruct the Heap of 6
elements
2 1192 151 3
Hence while re-heapifying the
maximum number of
movements can take place if
100 112 118 171
4 5 the content of the apex flows
6 7
down to pendent level

Note here the disturbed node is


apex (Root).
Hence t worst  maximum movement
 movement up to lth level from zeroth level
Therefore t worst  l  log2n  O(log2n)
Heap of 6 elements made

1 132
1
1 151
1

2 1192 151 3
2 1192 132 3

100 112 118 171


4 56 7 100 112 118 171
4 5 7
6
Repeat :
Swap last and first.
Cut the last. Redefine last = last – 1
Re-Heapify
1 118
1 1 118
1

2 1192 132 3 1192 132 3


2

100 112 151 171 100 112


4 5 6 7 4 5 151 171
6 7

1 118
1

2 1192 132 3

100 112 151 171


4 5
6 7
1 132
1
1 118
1

2 1192 118 3
2 1192 132 3

100 112 151 171 100 112 151 171


5 5
4 6 7 4 6 7
1 112
1 1 112
1

2 1192 118 3
2 1192 118 3

100 132 151 171


6 7 100 151 171
4 5 4 132
5 6 7
1 112
1 1 119
1

2 1192 118 3 2 1122 118 3

100 132 151 171 100 132 151 171


4 5 6 7 4 5 6 7

1 100
1 1 100
1

2 1122 118 3 2 1122 118 3

119 132 151 171 119 132 151 171


4 5 6 7 5 6 7
4
1 100
1 1 118
1

2 1122 118 3 2 1122 100 3

119 132 151 171 119 132 151 171


4 5 6 7 4 5 6 7

1 100
1 1 100
1

2 1122 2 1122 118 3

118 119 132 151 171 119 132 151 171


7 4 5 6 7
3 4 5 6
1 100
1 1 112
1

2 1122 2 1002

118 119 132 151 171 118 119 132 151 171
3 4 5 6 7 3 4 5 6 7

1 1 100
1
100

2 1122

112 118 119 132 151 171 118 119 132 151 171
2 3 4 5 6 7 7
3 4 5 6
1
100

112 118 119 132 151 171


2 3 4 5 6 7

100 112 118 119 132 151 171


1 2 3 4 5 6 7

t  log2n for one swapping of root and last

t n log2n for all n swappings of root and last

 HEAP SORT IS O(n log2n)


1. Develop the algorithm for Heap Sort and work out the exact
computing time. Profile the algorithm
2. Given a set of n elements construct a heap with the largest in the
root apex and then heap sort them in descending order. Work
out the exact computing time. Represent it in order notation
3. Generate the worst case and best case example if possible
4. Instead of shifting one element at a time can it be possible to shift
two largest elements at a time ? (Should be possible). Redesign
the algorithm. Re-work out complexity measures.
5. Write an algorithm to trace the number of shuffling suffered by
each element before the element gets into a sorted sequence
during heap sort. Profile this algorithm
A Complete study on Hashing algorithms
Note: The performance analysis involves a detailed
study of synonyms, Collisions, Overflow, Loading
density, Retrieval access – Probing in addition to
time and space analysis
To be continued . . .

You might also like