Searching and Sorting

Data Structure
Mrs. Geetha.V
Asst. Prof.
Dept. of IT
NITK, Surathkal
Index
• Introduction
• Algorithm Efficiency
• Big O notation
• Searching and sorting
• Algorithm design techniques
Introduction
• Pseudo code:
– One of the most common tools for defining
algorithms is pseudo code, which is part
English, part structured code.
• Algorithm Header
– Describes parameters and lists any pre- or
post- conditions. The header should also
indicate the return value .
Introduction
Algorithm search (val list <array>,
val argument <integer>
ref location <index>)
Search array for specific item and return its
location
Pre- list contains data array to be searched
argument contains data to be searched
Post- location contains index of element matching
argument or undetermined if not found
Return <boolean> True if found False if not found
Introduction
Algorithm average
Pre: nothing
Post: numbers and average are printed
1 i=0;
2. loop (not end of file)
1. read number into array[i]
2. print (number)
2. sum= sum + number
3. i = i + 1
3. end loop
4. avg = sum / i
5. print (avg)
6. end average
Introduction
• Data structure :
– A combination of elements each of which is
either a data type or another data structure.
– A set of associations or relationships
(structure) involving the combined elements
• Abstract data type (ADT)
– ADT consists of a set of definitions that allow
programmers to use the functions while hiding
the implementation.
– eg. Linked list, tree, a network etc
Algorithm efficiency
How to compare two different algorithms
that solve same problem?
Solution1: Use the number of lines to be

executed to complete the task.
f(n) = efficiency
ie function of number of elements to be
processed.
• If an algorithm is linear,
f(n)= number of lines in algorithm.
• Linear loops
1. i=1
2 . loop (i <=1000)
1. application code
2. i=i+1;
3. end loop
Here number of iterations = 1000.
f(n) = n
Linear loops
1. i=1
2 . loop (i <=1000)
1. application code
2. i=i+2;
3. end loop
Here number of iterations = 1000/2 = 500.
f(n) = n /2
Logarithmic loops : multiply loop
1. i=1
2 . loop (i < 1000)
1. application code
2. i = i x 2;
3. end loop
Here number of iterations = 2iterations < 1000.
f(n) = log2 n
Logarithmic loops : Divide loop
1. i=1000
2 . loop (i >= 1)
1. application code
2. i = i / 2;
3. end loop
Here number of iterations =
1000 / 2iterations >= 1
f(n) = log2 n
Multiply Divide
Iteration Value of i Iteration Value of i
1 1 1 1000
2 2 2 500
3 4 3 250
4 8 4 125
5 16 5 62
6 32 6 31
7 64 7 15
8 128 8 7
9 256 9 3
10 512 10 1
(exit) 1024 (exit) 0
• Nested loop:
Consider how many iterations each loop
contains.
Iterations= outer loop iteration x inner loop

iteration.
Let us consider 3 different nested loop.
Linear logarithmic, dependent quadratic and
quadratic.
Linear logarithmic:
1. i=1
2 . loop (i <= 10)
1. j = 1
2. loop (j <= 10)
1. application code
2. j = j * 2;
3. end loop
4. i = i + 1
3. end loop
Inner loop iterations = log2 10
Outer loop iterations = 10
Total iterations = 10 log2 n
Dependent quadratic:
1. i=1 When i=4
j values will be
2 . loop (i <= 10) 1, 2, 3, 4
1. j = 1
2. loop (j <= i)
1. application code
2. j = j + 1;
3. end loop
4. i = i + 1
3. end loop
Outer loop iterations = 10 = n
Inner loop iteration depends on outer loop.
First iteration 1
Second iteration (i=2 j= 1, 2 )=> 2
10 iterations of i we have
1+2+3……..+9+10=55
Avg of this = 5.5 which is same as (10+1)/2;
So it is (n+1)/2 for inner loop
Total iterations = n (n+1) /2
Quadratic:
1. i=1
2 . loop (i <= 10)
1. j = 1
2. loop (j <= 10)
1. application code
2. j = j + 1;
3. end loop
4. i = i + 1
3. end loop
Inner loop iterations = n
Outer loop iterations = n
Total iterations = n x n = n2
• Big –O Notation:
For a given function g(n), we denote by O(g(n))
(pronounced "big-oh of g of n" or sometimes
just "oh of g of n") the set of functions.
O(g(n)) = {f(n): there exist positive constants c
and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥
n0}.
We use O-notation to give an upper bound on a
function, to within a constant factor.
The Big O notation can be derived from
f(n) using following steps.
1. In each term, set the co efficient of the
term to 1.
2. Keep the largest term in the function
and discard the others.
Terms are ranked from lowest to highest
as follows
log2n, n, nlog2n , n2, n3, …. nk, 2n, n!
eg:
To calculate the Big-O notation for
F(n) = n(n+1)/2 = ½ n2 + ½ n
Remove all coefficients
n2 + n
Remove smaller factors:
n2
So big-O notation is stated as O(f(n)) = O(n2 )
Efficiency Big-O
Logarithmic O(log2n)
Linear O(n)
Linear logarithmic O(nlog2n)
Quadratic O(n2)
Polynomial O(nk)
Exponential O(cn)
Factorial O(n!)
Exercise
1. Write a pseudo code algorithm for
dialing a phone number.
2. Write a pseudo code for finding n
prime numbers. Find efficiency of the
algorithm and represent it in Big-O
notation.
Searching and Sorting
Searching
• Linear Search
– Checks every element of a list until a match is found
– Can be used to search an unordered list
• Binary search
– Searches a set of sorted data for a particular data
– Considerable faster than a linear search
– Can be implemented using recursion or iterration
Searching
• Linear search Average case:
i = 1; N/2 comparison needed
loop(i<=n)
if(a[i]==key) Best Case:
{ print “key found” Value equal to first
break; element tested
}
i=i+1; Worst case
Value not in list ->N
comparison needed
O(N)
Binary search
Find 29 in the list
[10, 13, 14, 29, 37]
Examine 14: [10, 13] [14] [29, 37]

Examine 29: [10, 13, 14] [29] [37]
Binary search
min := 1;
max := N; {array size: var A : array [1..N]
of integer}
repeat mid := (min + max) div 2;
if x > A[mid] then min := mid + 1
else max := mid - 1;
until (A[mid] = x) or (min > max);
Binary search
• Worst case performance O(log n)
• Best case performance O(1)
• Average case performance O(log n)
Sorting
• Insertion sort
• Quick sort
• Merge sort
• Radix sort
Insertion sort
Method: select one element at a time and
insert it into its proper position
• Ex. arranging a hand of cards
• Begin by dividing the array into two regions:
sorted and unsorted
• Initially, the sorted region is empty
• At each step, move first unsorted item into
its proper position in the sorted region
Insertion sort
1. A[0] is sorted; A[1]-A[N-1] are unsorted
2. Repeat N times:
1. nextItem = first unsorted element
2. Shift sorted elements > nextItem over
one position (A[x] = A[x-1])
3. Insert nextItem into correct position
Insertion sort
• Insert 35 into following list
11 17 19 42 43 47
• Shift all numbers > 35 by one
position to its right , then insert.
11 17 19 35 42 43 47
Insertion sort
Sorted Array
– Finding the right spot – O(Log N)
– Performing the shuffle – O(N)
– Performing the insertion - O(1)
– Total work: O(Log N + N + 1) = O(N)
Insertion sort
• Algorithm
insertionSort(array A)
begin
for i := 1 to length[A] do
begin
value := A[i];
j := i - 1;
while j >= 0 and A[j] > value do
begin
A[j + 1] := A[j];
j := j - 1;
end;
A[j + 1] := value;
end;
end;
Insertion sort
• Start: [29 ][ 10, 14, 37, 13]
• Move 10: [10, 29 ][ 14, 37, 13]
• Move 14: [10, 14, 29 ][ 37, 13]
• Move 37: [10, 14, 29, 37 ][ 13]
• Move 13: [10, 13, 14, 29, 37 ][ ]
• End: [10, 13, 14, 29, 37]
Insertion sort
• Worst case performance O(n2)
– When array is already sorted
• Best case performance O(1)
– When array is in reverse order
• Average case performance O(n2)
– Quadratic
• Insertion sort is better for less than 10
elements
Mergesort and Quicksort
Sorting algorithms
• Insertion, selection and bubble sort
have quadratic worst-case performance
• The faster comparison based
algorithm?
O(nlogn)
• Mergesort and Quicksort

Merge Sort
• Apply divide-and-conquer to sorting problem
• Problem: Given n elements, sort elements
into non-decreasing order
• Divide-and-Conquer:
– If n=1 terminate (every one-element list is already
sorted)
– If n>1, partition elements into two or more sub-
collections; sort each; combine into a single sorted list
• How do we partition?
Partitioning - Choice 1
• First n-1 elements into set A, last element
set B
• Sort A using this partitioning scheme
recursively
– B already sorted
• Combine A and B using method Insert() (=
insertion into sorted array)
• Leads to recursive version of InsertionSort()
• Put element with largest key in B, remaining
elements in A
• Sort A recursively
• To combine sorted A and B, append B to
sorted A
– Use Max() to find largest element  recursive
SelectionSort()
– Use bubbling process to find and move largest element
to right-most position  recursive BubbleSort()
• All O(n2)
• Let’s try to achieve balanced
partitioning
• A gets n/2 elements, B gets rest half
• Sort A and B recursively
• Combine sorted A and B using a
process called merge, which combines
two sorted lists into one
– How? We will see soon
Example
• Partition into lists of size n/2
[10, 4, 6, 3, 8, 2, 5, 7]
[10, 4, 6, 3] [8, 2, 5, 7]
[10, 4] [6, 3] [8, 2] [5, 7]
[4] [10] [3][6] [2][8] [5][7]

Example Cont’d
• Merge
[2, 3, 4, 5, 6, 7, 8, 10 ]
[3, 4, 6, 10] [2, 5, 7, 8]
[4, 10] [3, 6] [2, 8] [5, 7]
[4] [10] [3][6] [2][8] [5][7]

Static Method mergeSort()
Public static void mergeSort(Comparable a[], int left,
int right)
{
// sort a[left:right]
if (left < right)
{// at least two elements
int mid = (left+right)/2; //midpoint
mergeSort(a, left, mid);
mergeSort(a, mid + 1, right);
merge(a, b, left, mid, right); //merge from a to b
copy(b, a, left, right); //copy result back to a
}
}
Merge sort
function merge(a,b,left,mid,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) ≤ first(right)
append first(left) to result
left = rest(left)
else append first(right) to result
right = rest(right)
end while
if length(left) > 0
append left to result
else append right to result
return result
Merge sort
Sort the following:
10, 4, 6, 3, 8, 5, 2 => n=7 n/2=3.5 ~ 3
[10, 4, 6] [3, 8, 5,2]
[10, 4, 6] => [10][4,6]
[3,8,2,5] =>[3,8][5,2]
Merge sort
10, 4, 6, 3, 8, 5, 2
[10, 4, 6] [3, 8, 5,2]
[10] [4, 6] [3, 8] [5, 2]
[10] [4, 6] [3,8] [2,5]

Merge Sort
[2, 3, 4, 5, 6, 8, 10]
[4, 6, 10] [2, 3, 5, 8]
[10] [4, 6] [3,8] [2,5]

Quicksort Algorithm
Given an array of n elements (e.g., integers):
• If array only contains one element, return
• Else
– pick one element to use as pivot.
– Partition elements into two sub-arrays:
• Elements less than or equal to pivot
• Elements greater than pivot
– Quicksort two sub-arrays
– Return results
Quicksort Algorithm
<pivot Pivot >pivot

Example
We are given array of n integers to sort:
40 20 10 80 60 50 7 30 100
Pick Pivot Element
There are a number of ways to pick the pivot
element. In this example, we will use the first
element in the array:
40 20 10 80 60 50 7 30 100
Partitioning Array
Given a pivot, partition the elements of the array
such that the resulting array consists of:
1. One sub-array that contains elements >= pivot
2. Another sub-array that contains elements < pivot
The sub-arrays are stored in the original data

array.
Partitioning loops through, swapping elements

below/above pivot.
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
too_big_index too_small_index
1. While data[too_big_index] <= data[pivot]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
2. While data[too_small_index] > data[pivot]
--too_small_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
3. If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
pivot_index = 0 40 20 10 80 60 50 7 30 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
4. While too_small_index > too_big_index, go to 1.
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 60 50 7 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
5. Swap data[too_small_index] and data[pivot_index]
pivot_index = 0 40 20 10 30 7 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 4 7 20 10 30 40 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
Partition Result
7 20 10 30 40 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
<= data[pivot] > data[pivot]

Recursion: Quicksort Sub-
arrays
7 20 10 30 40 50 60 80 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]

Quicksort Analysis
• Assume that keys are random,
uniformly distributed.
• What is best case running time?
Quicksort Analysis
– Recursion:
1. Partition splits array in two sub-arrays of size n/2
2. Quicksort each sub-array
Quicksort Analysis
– Recursion:
– Depth of recursion tree?
Quicksort Analysis
– Recursion:
– Depth of recursion tree? O(log2n)
Quicksort Analysis
– Recursion:
– Number of accesses in partition?
Quicksort Analysis
– Recursion:
– Number of accesses in partition? O(n)
Quicksort Analysis
• Best case running time: O(n log2n)
Quicksort Analysis
• Worst case running time?
Quicksort: Worst Case
• Assume first element is chosen as
pivot.
• Assume we get array that is already in
order:
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]
++too_big_index
--too_small_index
pivot_index = 0 2 4 10 12 13 50 57 63 100
[0] [1] [2] [3] [4] [5] [6] [7] [8]

Quicksort Analysis
• Assume that keys are random, uniformly
distributed.
– Recursion:
1. Partition splits array in two sub-arrays:
• one sub-array of size 0
• the other sub-array of size n-1
– Depth of recursion tree?
Quicksort Analysis
distributed.
– Recursion:
– Depth of recursion tree? O(n)
Quicksort Analysis
distributed.
– Recursion:
– Number of accesses per partition?
Quicksort Analysis
distributed.
– Recursion:
– Number of accesses per partition? O(n)
Quicksort Analysis
distributed.
• Worst case running time: O(n2)!!!
• So we have various ways of pivot selection.

Radix Sort
• How IBM made its money, using punch card
readers for census tabulation in early 1900’s.
Card sorters worked on one column at a
time.
• Sort each digit (or field) separately.
• Start with the least-significant digit.
• Must use a stable sort.
RADIX-SORT(A, d)
1 for i ← 1 to d
2 do use a stable sort to sort array A on digit i
Radix Sort in Action
Correctness of Radix Sort
• induction on number of passes
• base case: low-order digit is sorted correctly
• inductive hypothesis: show that a stable sort
on digit i leaves digits 1...i sorted
– if 2 digits in position i are different, ordering by
position i is correct, and positions 1 .. i-1 are irrelevant
– if 2 digits in position i are equal, numbers are already
in the right order (by inductive hypotheis). The stable
sort on digit i leaves them in the right order.
• Radix sort must invoke a stable sort.
Running Time of Radix Sort
• use counting sort as the invoked stable
sort, if the range of digits is not large
• if digit range is 1..k, then each pass
takes Θ(n+k) time
• there are d passes, for a total of
Θ(d(n+k))
• if k = O(n), time is Θ(dn)
• when d is const, we have Θ(n), linear!
Summary
• Insertion sort
• Merge sort
• Quick sort
• Radix sort
Algorithm design methods
The Greedy Method
Technique
• The greedy method is a general algorithm design
paradigm, built on the following elements:
– configurations: different choices, collections, or values to find
– objective function: a score assigned to configurations, which
we want to either maximize or minimize
• It works best when applied to problems with the
greedy-choice property:
– a globally-optimal solution can always be found by a series of
local improvements from a starting configuration.
Branch and bound
• Branch and bound (BB) is a general
algorithm for finding optimal solutions
of various optimization problems,
especially in discrete and combinatorial
optimization. It consists of a systematic
enumeration of all candidate solutions,
where large subsets of fruitless
candidates are discarded en masse, by
using upper and lower estimated
bounds of the quantity being optimized.
Branch and bound
• Used in following cases
• Knapsack problem
• Integer programming
• Nonlinear programming
• Traveling salesman problem (TSP)
• Quadratic assignment problem (QAP)
• Maximum satisfiability problem (MAX-SAT)
• Nearest neighbor search
Dynamic programming
• Dynamic programming is both a
mathematical optimization method, and a
computer programming method. In both
contexts, it refers to simplifying a
complicated problem by breaking it down
into simpler subproblems in a recursive
manner. While some decision problems
cannot be taken apart this way, decisions
that span several points in time do often
break apart recursively;
Divide and conquer
• Divide problem in to sub problem
• Solve the sub problem
• Combine the results
Algorithm design techniques
• Divide and conquer Algorithm
To solve a problem of size N, recursively
solve two subproblems of size
approximately N/2, and combine their
solutions to yield a solution to the
complete problem.
• Divide and conquer Algorithm
Eq. Binary search
merge sort
quick sort
• Divide and conquer – Algorithm template
Function P(n)
if n<= c
solve P directly
return its solution
else P=> P1,..Pk //divide
for i=1 to k
Si=P(ni); //conquer
S1, ….Sk => S //merge
return S
Summary
• Introduction
• Algorithm Efficiency
• Big O notation
• Searching and sorting
• Algorithm design techniques

Searching and Sorting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Searching and Sorting

Uploaded by

Copyright:

Available Formats

Data Structure

Solution1: Use the number of lines to be

Iterations= outer loop iteration x inner loop

[10, 13, 14, 29, 37]

Examine 14: [10, 13] [14] [29, 37]

• Mergesort and Quicksort

[10, 4] [6, 3] [8, 2] [5, 7]

[4] [10] [3][6] [2][8] [5][7]

[3, 4, 6, 10] [2, 5, 7, 8]

[4, 10] [3, 6] [2, 8] [5, 7]

[4] [10] [3][6] [2][8] [5][7]

[10] [4, 6] [3, 8] [5, 2]

[10] [4, 6] [3,8] [2,5]

[4, 6, 10] [2, 3, 5, 8]

[10] [4, 6] [3,8] [2,5]

<pivot Pivot >pivot

The sub-arrays are stored in the original data

Partitioning loops through, swapping elements

<= data[pivot] > data[pivot]

<= data[pivot] > data[pivot]

<= data[pivot] > data[pivot]

• So we have various ways of pivot selection.

You might also like