You are on page 1of 5

Introduction to Algorithms Problem Set 5 Solutions

CS 4820 Fall 2017 Due 11:59pm Thursday, September 28

For a quick review of probability you may want to look at our handout from our TA Abhi posted on
CMS, the lecture notes from an old run of CS 2800 http://www.cs.cornell.edu/˜rafael/discmath.
pdf, or read Section 13.1 of our textbook.

(1) (10 points) The security team of your company noticed that there are a surprisingly large number
of outside hacks going on; they wonder if much of this may be the doing of one hacker. There were n
events 1, . . . , n, each come with an event log: xi . The security team developed a reliable method to test
for the logs of two events xi and xj , if the two are the doing of the same hacker. However, these tests
take quite a while to perform. They are interested in knowing if the majority of these events were all
due to a single hacker. Give an algorithm that performs at most O(n log n) of such log comparisons to
determine if this is the case. You may assume that n is a power of 2.

Solutions via Divide and Conquer Use Divide and Conquer strategy. Recursively solve the
problem for the n/2 elements S1 = {1, . . . , n/2}, and the set S1 = {n/2 + 1, . . . , n}. Notice two facts
(a) If the majority of the hacks are due to a single hacker, this must be the case for at least one of
the two halves. To see why notice that if the original set S = {1, . . . , n} has at least n/2 + 1 hacks
coming from the same hacker, than at least n/4 + 1 of these most fall on one side, and on this
side, this is a majority.
(b) On each side there can be at most one such hacker that committed the majority of the hacks on
that side.
Now we have three cases to consider due to (b) above
• if neither side had a majority hacker, than the original set S also doesn’t have a majority hacker
by (a) above.
• if one of the two sides had a majority hacker, and the other side did not, then let xi be one of
the hacks committed by this hacker. If there is a majority hacker, it must be this person. To test
if indeed this is the case, test xi against all xj for j 6= i, to determine this. This required ≤ n
comparisons.
• if both sides has a majority hacker, let xi and xj be one of the hacks committed by these hackers
respectively. If there is a majority hacker, it must be one of these two persons. To test if indeed
this is the case (and to decide which of the two), test both xi and xj against all xk to determine
this. This required ≤ 2n comparisons.

Majority(x1 , . . . , xn )
If n = 1 return ‘‘yes’’ and x1
Else
run Majority(x1 , . . . , xn/2 )
if return ‘‘yes’’ and y
count the number of xi ’s that are by the same hacker as y
if number > n/2 return ‘‘yes’’ and y
else run Majority(xn/2+1 , . . . , xn )
if return ‘‘yes’’ and y
count the number of xi ’s that are by the same hacker as y
if number > n/2 return ‘‘yes’’ and y
else return ‘‘no’’

To see that this algorithm required O(n log n) comparisons, let T (n) be the number of comparisons
needed on an n log list. From the above, we get that T (n) ≤ 2T (n/2) + 2n. This recurrence solves
to T (n) = 2n log n = O(n log n). For this last point, the students can quote the book, or lecture, or

1
say that there will be at most log2 n levels of recurrence, and each level we will need to do at most
2n comparisons: at level k, we have 2k subproblems of size n/2k , and each subproblem needs 2 · n/2k
comparisons, which is a total of 2n for the level.

Randomized Algorithm

Majority(x1 , . . . , xn )
For i = 1 to log n
select an index i at random
test xi against all xj
output xi if majority and stop the algorithm
endFor
output "probably no majority element"
if no majority element found after log n tries

If there is a majority element, each iteration has probability at least 1/2 of picking such an element,
and then outputting it. So the probability that log2 n iterations fail to fine the majority, if one exists is
at most 1/2log2 n = 1/n.
While this algorithm is not guaranteed to find the correct answer, it is very likely to find the correct
answer.

Linear time solution There is also an O(n) solution to this which turns out to be rather well
known. Here is how it works

Majority(x1 , . . . , xn )
count=1
set y = x1 \\ our candidate for majority
For i = 2 to n
if count ≥ 1
test if y same hacker as xi
if it is increase count by 1
else decrease count by 1
else (if count=0))
set y = xi and count=1
endFor
test y against all xi
if majority is the same hacker as y
return ’’yes’’ and y
else return ’’no’’

Alternately, one can also set y + xi for the i when count decreases to 0 (we here set y = xi+1 instead.
Number of comparisons is clearly at most 2n (that is, O(n)).
Reason this algorithm works correctly is a bit more tricky. What we need to prove is that if there
is a majority hacker than the hack y when we get to i = n is a hack by this person. The test run at the
end reveals if y is or isn’t majority, so what we need is only that no other hacker can be majority.
Proof by induction We prove the statement by induction on n. When n = 1 the statement is
obvious.
Note that all we have to prove is this: if there is a majority hacker, than the hack y when i gets to
n is by this person. OK if a y is found at that point even if there is no majority hacker, that is why we
do the test of y at the end.
Consider the initial sequence while y = x1 . If y never changes, then this is equivalent to at least
1/2 of the sequence, and the algorithm returns the correct answer. Suppose count decreases to 0 after
inspecting item xi . Note that at this point, exactly half of the hacks in the x1 , . . . , xi sequence are by
the hacker who committed x1 . So if there is a majority person, this has done at most exactly 1/2 of

2
the hacks in this part of the sequence, so must have done more than half in the remaining sequence
xi+1 , . . . , xn . Since i ≥ 1 here, this is a shorter sequence, so by the induction hypothesis the algorithm
will return a correct y if there is a majority in this later part, proving that the algorithm is correct.

(2) (10 points) Given two sorted arrays a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bm ) (with ai < ai+1
and bi < bi+1 for all i), and an integer k: Give an algorithm that finds the k-th element of the merged
list in O(log(n + m)) time. You may assume all numbers ai and bj are different, and that you have O(1)
access to any array elements ai or bj .

Solutions: All solutions use a form of binary search. It is no loss of generality to assume n and m
are both at most k, as the part of either array beyond the array’s k’s element is certainly not the kth
overall. Each make one recursive call, and they differ in arguing which of n, m or k gets a factor of two
smaller.
There is no loss of generality to assume that k ≤ m + n as otherwise there is no kth element.
Solution 1: We’ll aim to solve this with one comparison and one recursive call with k/2 (or dk/2e
if k is not even). This will show that there are at most log2 k = O(log k) comparisons.
If n and m are both at least k/2, then we let s = bk/2c, and t = k − j, and compare aj and bt .
Observation For any pair of indices with s + t = k, if as < bt then the kth element of the combined
list is the (k − s)th element of the lists as+1 , . . . , an and b1 , . . . bt .
Proof To see why this is true, there are at least (k − 1) elements smaller than bt so any element on
the b list above bt is too high. Also, there are at most k − 2 elements below as , so the first k elements
of the a list are below the kth.
So the algorithm will do one recursive call on these smaller lists, with k 0 = k − s ≤ dk/2e.
If one of n or m is less than k/2, say n < k/2), then we can instead take s = n, and t = k − j. And
use the observation above. If as < bt the recursive call will have an empty a lists, so we can identify
the kth element with no comparison, if as > bt , then the recursive call will have k 0 = k − t < k/2.
The code is summarized below

Select(a1 , . . . , an ; b1 , . . . , bm ; k)
If k = 1 return min(a1 , b1 )
If n = 0 return bk
If m = 0 return ak
If n, m ≥ k/2 let s = bk/2c, and t = k − j
If n < k/2 let s = n, and t = k − j
If m < k/2 let t = m and s = k − `
Compare as and bt
If as < bt
run Select(as+1 , . . . , an ; b1 , . . . , bt ; k − s)
Else
run Select(as , . . . , an ; bt+1 , . . . , bm ; k − t)

Correctness follows from the observation above.


To bound the running time note that the recursive call is either to a problem with n = 0 or m = 0
(when we used the last element of one of the array, and that turned out to be the smaller or the two), or
to a k 0 that is at most 1/2 of the original k. So the number of iteration is at most log2 k ≤ log2 (n + m).
Alternate argument: notice that the argument only need that the lower of the two in the com-
parison, and all elements known to be smaller cannot be the kth. The restriction of the b array till only
the `th is not used, so OK not to do that.
Solutions 2: Instead of diving k, we can divide n or m.
we can just go with s = n/2 and t = m/2 (with floor or ceiling), and delete the first half of smaller
or the last half of the bigger item depending if s + t is below or above k.
Observation 2 For any pair of indices s and t

3
• if s + t ≤ k, if as < bt then the kth element of the combined list is the (k − s)th element of the
lists as+1 , . . . , an and b1 , . . . bm .
• if s + t ≥ k, if as < bt then the kth element of the combined list is the kth element of the lists
a1 , . . . , an and b1 , . . . bt−1 .
To see why, notice that if s + t < k we cannot through away the elements above bt , but still know that
as and the elements below as are too small. If s + t > k then bt and elements above it is too high.

Select(a1 , . . . , an ; b1 , . . . , bm ; k)
If n = 0 return bk
If m = 0 return ak
Let s = bn/2c, and t = bm/2c
if s + t ≤ k
If as < bt
run Select(as+1 , . . . , an ; b1 , . . . , bm ; k − s)
if as > bt
run Select(a1 , . . . , an ; bt+1 , . . . , bm ; k − t)
if s + t > k
If as < bt
run Select(a1 , . . . , an ; b1 , . . . , bt−1 ; k)
if as > bt
run Select(a1 , . . . , as−1 ; b1 , . . . , bm ; k)

Correctness follows from the observation above.


To bound the running time, we note that wither n or m decreases by a factor of 2 (at least, possibly
1 more if m or n is even), so in total this can happen (log2 n + log2 m) times, and log2 n + log2 m =
log2 nm = 2 log2 (n + m) = O(log(n + m).
Alternate versions
We can also start by assuming both lists have more than 1 element, that is n, m, k > 1 (will consider
the special case when one of this is 1 separately as the base case). Take the middle element of the first
at most k on each, so let s = bmin(n, k)/2c and t = bmin(m, k)/2c and consider as and bt . By the
definition s + t ≤ k. We then use part of the above Observation 2
The resulting code is

Select(a1 , . . . , an ; b1 , . . . , bm ; k)
If n = 0 return bk
If m = 0 return ak
Let s = bmin(n, k)/2c, and t = bmin(m, k)/2c,
Compare as and bt
If as < bt
run Select(as+1 , . . . , an ; b1 , . . . , bn ; k − s)
Else
run Select(a1 , . . . , an ; bt+1 , . . . , bm ; k − t)

Correctness follows from observation above.


To show the running time, notice that one of the lists gets shorter, and k also gets smaller. Maybe
the simplest is to observe that either min(n, k) or min(m, k) decreases by more than a factor of 2. This
can happen at most log2 n + log2 m times, and log2 n + log2 m = log2 (nm) ≤ 2 log2 (n + m).
Alternately, they can assume n, m ≤ k (that is delete the part of the arrays above the kth, or argue
with more cases, when k can some time decrease by a factor of 2 (as in the first proof).

(3) (10 points) Recall that our randomized minimum cut algorithm showed that a graph with n
nodes can have at most n(n − 1)/2 minimum cuts. This bound is tight, as a cycle has exactly this many:
deleting any pair of edges disconnects the graph into two non-empty sides. Suppose we are interested
not only in the minimum cuts, but in all cuts that are within a factor of at most 2 of the minimum.

4
Suppose the minimum cut in a graph has c edges. Show that the number of cuts with at most 2c edges
is at most O(n4 ).

Solution: Let G be at n node graph, and assume the minimum cut of G has c edges. Consider a
cut separating the nodes of the graph into two sets (A, B) with at most c̄ = 2c edges. Let Ē denote the
set of edges in this cut. We’ll show that the random contraction algorithm has probability at least n−4
of not contracting any edge in Ē till it gets down to 4 nodes. A four node graph has 6 cuts, so our cut
(A, B) has at least 24n−4 probability of being among these 7 cuts. If we were to select of these 7 cuts
at random, with probability 24 −4 ≥ 3n−4 we end up with our cut (A, B). This proves the claim as of
7 n
there are K such cuts, the one that is least likely to be the result of the contraction algorithm with the
random selection at the end must have probability at most 1/K, so we must have K ≤ 31 n4 .
Recall from class that the degree of each node v in G most be at least c, otherwise the cut with v
as the only node on one side would be smaller than c. So this implied that the number of edges in G is
at least m ≥ cn/2, as summing the degrees of nodes counts every edge twice.
Now consider the contraction algorithm: contracting a random edge e in G the probability of that
this edge is in Ē is at most c̄/m, so with probability at least (1−c̄/m) ≤ (1− n4 ), the first edge contracted
is not in our special cut. Now we have a graph with only n − 1 nodes, with the minimum cut at least c
and our special cut still has at most c + 1 edges. Repeating this argument shows that the probability
of ending up with the cut (A, B) is at least
4 4 4
(1 − )(1 − )(1 − )...
n n−1 n−2
When the remaining graph has at most 4 nodes we must stop as (1 − 4/4) = 0. This product can
be written as
n−4n−5n−6 1 1 2·3·4
... = = ≥ 24n−4
n n−1n−2 5 n(n − 1)(n − 2)(n − 3) n(n − 1)(n − 2)(n − 3)

proving the claim.

You might also like