You are on page 1of 12

CS 5084: Introduction to Algorithms Fall 2015

Class:

12.

December 3, 2015.

Posted 12:47, November 19, 2015

Minimum Spanning Trees (MST)

The minimum spanning tree problem is the poster-application of greedy algorithms: it is an important, non-trivial problem, and yet has several greedy, efficient algorithms! This is a rare situation.
In most involved problems greed does not pay; but when it does, the result is often admirable.
G = (V, E; w); w is a weight function, w : E R.
G is assumed connected and undirected.
This is the first time we assume edges have weights. Almost without exception, the weights we use
are nonnegative integers.
MST :

A tree on V reaching all the nodesand using a subset of E, such that the sum of its weights
is minimal.
See example in page 5. We need a few definitions:
cut: of a set. It is the partition of the set into two1 (or more) disjoint subsets. V can have a cut
C = (S,V S).
An edge crosses a cut if its end-nodes are in distinct subsets of the cut.
A cut of V respects a set A E if no edge in A crosses the cut.
A light edge of a cut is an edge which crosses the cut, and has a smaller weight than all others
crossing the cut (if several edges have the same minimal weight they are all considered light).
A promising set: a subset of some MST edge-set.
A safe edge with respect to a promising set: adding it to the set leaves it promising.
Note: The word some in the definition of a promising set is significant; there is no guarantee it
would be in all or most minimum spanning trees, but there is the guarantee of one possible way of
completing it to a MST.
The text suggests the following generic MST algorithm, which uses a set A of edges, initially empty,
and clearly, an empty set is promising:
1 In

the proof of Theorem 1 we look at a cut of V into two subsets with a smooth boundary, but nothing changes in the
needed arguments if (a) the number of subsets is larger, and the boundaries between them are crazed, and (b) the number
of components is not constant during the life of the algorithm.

CS5084 notes XII November 19, 2015

GenericMstAlgorithm (G(V, E; w))


1. { A = ;
2.
while (A is not a spanning tree)
3.
add to A a safe edge (u, v);
4.
return A;
5. }
How do we find that safe edge?
T1

Theorem 1. Let G = (V, E; w) be a connected, undirected graph with weight function w on E.


Let A E be a promising set respected by a cut C = (S,V S), and let (u, v) be a light edge for this
cut. Then A = A {(u, v)} is a promising set. Hence a light edge, like (u, v), is safe.
Proof:

V S
e

edge
light

e
S
u

Let T be the MST promised by A. If (u, v) is one


of the edges of T , then clearly A is promising. So
assume (u, v)
/ T , and add this edge to T , creating a
cycle: n edges on n nodes! Since T spans V , there is
at least another edge on the cycle which crosses the
cut (and is therefore not in A). Let it be e = (u , v ).
Since e is light, w(e) 6 w(e ). Deleting e from T and
keeping e, we are left with a spanning tree T of no
higher weight which includes all of A , hence A is
indeed promising.

This type of argument, which consists of adding an edge to an


created is also the key to showing properties of MSTs.

MST

and examining the cycle we

Graph manipulation and minimum spanning trees


1. Consider the situation where you have a graph as above, and you know its minimum spanning
tree, T . Then the weight of some edge a E is reduced. Find the new MST, induced by this change.
If a is one of the edges used in T , nothing changes we have a lighter and better MST.
Otherwise we use the same device as above: add the edge a to T and it will create a simple cycle
(not enough edges for a non-simple cycle). Test the weight of all edges on the cycle if a is the

CS5084 notes XII Micha Hofri

heaviest, never mind, the same tree is still the MST, but otherwise, toss out the heaviest edge you
found on the cycle to break it.
Note: This has been a non-algorithmic description; the specification of the algorithms below should
make it possible for you to answer questions like how do we find which edges are on the cycle?
how do we toss out an edge from the MST?
scratch

2. Again we have a graph G(V, E; w) as above, and we know its minimum spanning tree T . Now
a new edge , of weight w( ) is added to E, but V is unchanged: is between two nodes that are
spanned by T ; let us call the evolved graph G . Is T an MST of G as well? the determination is
similar: add the new edge to T , creating a simple cycle, and check the weights of the edges on the
cycle. If is found to be the heaviest on the cycle, no change needs to be made, and the same tree
continues to serve, but otherwise, if some edge e on the cycle reveals that w(e) > w( ), we make
the exchange: and use instead of e.

Several algorithms that build MSTs, and in particular, the two we look at, are specializations of the
GenericMstAlgorithm. They differ by the way they define the cut, and the method used to identify
a light edge as acceptable safe choice.

Kruskal algorithm for minimum spanning trees


This algorithm is simple to present and understand, but needs careful work to execute efficiently.
Fortunately we have done the preparatory work, when we described an efficient method for disjointset representation.

Kruskal (G(V, E; w))


1. { A = ;
2. F = E sorted by non-decreasing edge weight;
3. for all v V MakeSet(v);
4. Traverse F in order while (|A| < n 1) // A is not yet a spanning tree
5.
{ Let e = (u, v) be next in F:
6.
cu = FindSet(u); cv = FindSet(v);
7.
if (cu 6= cv ) { A = A {e}; Union (cu , cv ); }
8.
}
9.
return A;
10. }

CS5084 notes XII November 19, 2015

Correctness is due to Theorem 1. The relevant cut does not admit the nice image of a line slicing the
set V in two: and that is because it uses an evolving number of parts. Initially the cut is best viewed
as small circles around each node, so it respects no edge, and the lightest in E is the light edge for
this cut. As nodes are Union-ed, and edges enter A, the circles expand to accommodate the growing
components, so that the evolving cut respects Aincluding the nodes that A supports and no other
edge. This means that the cut needs to change its shape and member-count for each newly inserted
edge.
If the efficient disjoint-set representation is used, the main cost is that of sorting E.
How many edges need be considered? We denote this value by t; it can be anywhere in the range
(n 1, . . . , |E|). It will be at the top of the range for a graph like 106
, with the other
weights smaller than a million.
Many of the properties of minimum spanning trees can be inferred from the mechanics of Kruskal
algorithm. Examples:
The two lightest edges in the graph are always part of the MST.
When all the weights are distinct, the minimum spanning tree is unique.
Here is how this last claim can be shown:
Consider building the MST using Kruskals algorithm. The sequence of edges it considers is unique,
exactly those edges are selected that connect distinct components, and there is no room for alternatives, because the unique weights mean a unique sorted order.
Then we need to consider the fact that several algorithms for MST construction are known, and allow
for the possibility that another algorithm, A, could find another MST. To show this cannot be the
case, let {e1 , e2 , . . . , em } be the sorted list of all the edges in the graph we denoted by F. Let er be
the first edge in the list that A uses, and KA does not. The reason KA does not use it, is because that
when it comes to it in the sorted list, its end nodes are found to be already in the same component.
So there is a path between them, and er would close a cycle. This edge, er is the heaviest edge on
the cycle, since all others were picked from the sorted list before, hence removing it is the best way
to restore the tree property.2 Hence there cannot be a first edge that A uses and KA does not. A
similar claim, which needs a slightly different reasoning, shows that there is also no first (in F) edge
which KA uses and A does not. Since they use the same number (|V 1|) of edges, they must create
the same tree.
2 This

is the place we use the greedy nature of KA. Theorem 1 assures us that precisely such myopic decisions will
result in an optimal tree; we never need to select and keep in a cycle a heavy edge, with the hope that this pays a dividend
somehow down the line. Would that three-dimensional life was so simple!

CS5084 notes XII Micha Hofri

This is the type of proof we often use in proving two structures must be identical, even if created
by very different means order them, find position of first difference, reach a contradiction.
I have seen attempts to prove this claim which amounted to given an MST, if we try to add an
edge, we get a cycle, then either the new edge is the heaviest, which changes nothing, or it is
lighter than some tree-edge on the cycle, and we can exchange them, which yields a lighter tree,
which means it was not a minimal tree to begin with so we cannot get a different MST. This
is not a proof, however: it shows that local changes do not lead to anything new; but some other
algorithm, Prims, Boruvka, and there are a few others, which have different selection criteria,
could find a different way to cobble together an equally good, but quite different MST.
In fact, we know that MSTs created by different algorithms must share a lot of edges: we saw
that the lightest edge (or arc) adjacent to each node must be included in each MST. The reason
this fact does not determine entirely the minimum spanning tree is that many such edges will
be light for both their end nodes; still, this allows us to claim at least half the edges of a MST
are common to all MSTs, only after those are collected in, can algorithms get independently
creative.
On the other hand, the second-lightest MST (2-MST) is not unique under such conditions: Consider
a quadrilateral ABCD, with |A| = |{(A, B), (B,C), (C, D), (D, A), (A,C)}| = 5, and the corresponding lengths (6,5,2,1,3), which has the MST {(A, D), (C, B), (C, D)} of weight 8, and the 2-MSTs
{(A, D), (A, B), (C, D)} and {(A, D), (A,C), (B,C)}, both of weight 9.

? In a graph with unique edge-weights, are the weights of all spanning trees distinct? Not necessarily, as the last example shows; here is even a more blatant example, with the complete graph K4 :
3

c
5

d
2

aK4
7

Here either of the two three-edge sets U = {(d, a), (d, b), (d, c)}
and V = {(c, a), (c, b), (c, d)} span the graph, and both have the
total weights 9. And indeed, none of these is the MST, which is
{(d, a), (d, c), (b, c)}, that spans the graph and has weight 6. The
edge (c, d) is used in both U and V , hence its combined color!

It is the case, that in large, dense graphs sorting the edges is the dominant part of the run time, and
that the number of edges scanned by Kruskal is usually a small fraction of |E|though we saw an
example to the contrary. This coincidence suggests we may only want a prefix of the sorted list.
How can this be done meaningfully? And what does meaningfully mean here?
Making the effort only means anything if obtaining a small prefix costs materially less than a com-

CS5084 notes XII November 19, 2015

plete sort. The only method we have seen which achieves this is heapsort, where the heap construction is relatively inexpensive, and is in (|E|). Then we can obtain a prefix of size, say, 2|V |, and
run the rest of Kruskal, and if this prefix is exhausted before an MST is obtained, a further segment
can be produced. An even better option is to combine the two operations: instead of creating F as
above, start by creating in it a min-heap, and then remove from it one edge at a time, using it in line
5 in KA, until the tree is obtained.
The data structures that the KA needs are what is required to produce the sorted edges (or possibly
a heap) of size |E| and area to allocate for the disjoint set representation, which is of size 3|V |, for
the last version.

Here is another approach to constructing a spanning which is similar to Kruskal, but uses the technique shown under graph manipulations above instead: no sorting of edges is made, but the mechanism of the disjoint-set representation is set in motion. Edges are selected from E in arbitrary order.
If they bring together unconnected components we add them, creating a forest; empty initially, and
it grows. If an edge is found to be between vertices that are in the same component (the condition
in line 7 of Kruskal fails), we add it, observe the cycle we created, and delete the heaviest edge on
it. This continues until we are certain we have an MST. When does that happen?
When we exhausted E: just consider the possibilities of leaving one of the two lightest edges in the
graph to be the last choice. . . .

Prim algorithm for minimum spanning trees


Prims algorithm is closer in spirit to Theorem 1 of the previous notes. Like Kruskal, we raise, or
grow a tree one edge at a time, but it differs in that each edge joins those that have been selected so
far, rather than scattering pieces all over the place. This algorithm could lead to better locality of
storage usage.
We can select explicitly the node r from which the tree grows, and the cut evolves from the initial
Cr = ({r}, V {r}). Let TA denote the set of nodes which support A, the ever promising set of
edges we collect, then at that stage the cut is CA = (TA , V TA ).

CS5084 notes XII Micha Hofri

Prim (G(V, E; w, r))


1. { for all v V {key[v] = ; p[v] =NULL;}
2. p[r] = r; key[r] = 0; Q V ; // Q is a priority queue, maintained by key[v] values
3. while (Q 6= ) // not all the nodes have yet been reached
4.
{ u = min(Q); // retrieve the node closest to TA , mark it as out-of-Q
5.
for all neighbors v of u
6.
{ if (v Q and w(u, v) < key[v])
7.
{p[v] = u; key[v] = w(u, v); update Q; } // e = (u, v) joins A
8.
}
9. } return p[ ]; }

Page 635 in the text shows this algorithm in a particular example. The algorithm maintains two data
structures:
|V | size arrays, p[v], key[v]. They identify for each v
/ TA the lightest edge between v and TA and its
weight (it is infinite if there is no such edge). p[v] gives the node at the other end of the light edge,
and it is a potential parent of v if this edge is joined to the growing tree. key[v] is interpreted as the
cost of attaching v to the growing tree; initially it is unbounded. When a vertex gets into the tree
that has an edge to v, the weight of that edge gets to be key[v], its entry cost. As successive nodes
join the tree, at lower cost, line 6 tests if this node is now able to reduce its cost, to the weight of the
newly available edge.
And the other data structure is Q. It is conceptually possible to skimp, and not maintain Q explicitly,
only mark which nodes have joined the tree (which we do in any case, to keep the first test in line 6
simple), and replace the retrieval of u in line 4 by a search of the array key for the current light edge.
Performance wisenot a good idea.
This algorithm does not maintain the set of edges A explicitly. The returned tree is defined by the
parent pointer array p.
The connection to the Theorem in terms of the run-time variables is that the set S, which we denote
above by TA is V Q, and A = {(p[v], v), v V Q {r} }.
The performance of this algorithm depends mostly on the management of the set of unspanned
nodes maintained in Q. If we maintain Q as a min-heap, each extraction of the closest node u in line
4 takes (logV ) time, and this is done V times. Both tests in line 6 take constant time, for a total in
(E), but in the worst case we may find that we also make order-E changes of distances of nodes

CS5084 notes XII November 19, 2015

to TA , and each such change in Q has a cost which is in (logV ), hence this requires, in the worst
case, (E logV ) time. This appears comparable to the cost of Kruskal algorithm, but they behave
truly differently, and for different graphs each can display superiority.

Shortest-Paths in Graphs
Shortest is used for historic reasons. We define the length of a path between nodes in a graph
as the sum of the lengths of the edges on the path. But the lengths are usually referred to as weights.
Can represent any additive attribute.
The problem comes in several flavors:
1. Between one source and a single destination.
2. From one source to all possible destinations.
3. From any source to all destinations.
short

In addition we distinguish between scenarios where


(a) all edges have the same weights.
(b) weights may differ but must all be non-negative, and
(c) even this last property need not3 be satisfied.
Problems of type 1 cannot be done without tackling type 2, but may be terminated early, once the
optimal path to the prescribed destination has been determined.
Problems of type 2(a) are solved by breadth-first search.
Solving problems of type 2 in general creates a shortest-paths tree rooted at the source that spans all
nodes reachable from the source.
Why a tree? Since no cycle need ever be considered, regardless of the problem type or weight
function characteristics.
The graphs may be directed or not. The first kind is needed to model situations such as
one way streets,
3 Negative weights find use in describing situations where both gain and loss (or cost) can take place.

Possible contexts
are money (example: when considering arbitrage transactions with several currencies), or energy (example: a chemical
manufacturing process, in which some reactions are endothermicrequire energy to proceed, and some are exothermic
generate more heat than is pumped in).

CS5084 notes XII Micha Hofri

non-equality of distances, or weights, or costs, in the two directions,


negs negative weights always require that the graph is directed.
N OTATION : nearly all the needed notation was seen before. The graphs are given as G = (V, E; w),
where w : E R is the weight function; we can then define the weight for any pair of nodes,
(
w(e) when e = (u, v) E
w(u, v) =

otherwise, (u, v)
/E
In addition we shall denote by w(p) the weight of a path p (sum of edge-weights), and by (u, v)
the weight of the minimal-weight path between the nodes u and v.
Triangle inequality: The relation that holds for the side-lengths of any triangle in the Euclidean
plane: a < b + c, in () below. Significantly, the weigh function we use may or may not satisfy this
relation over any triangle, but the minimality of (, ) implies
(u,t)
t
w(t, v)

c ;

(u, v) 6 (u,t) + w(t, v).

()
a
A useful feature in the text: it defines two procedures that are common to most minimal-path algorithms:
(u, v)

InitializeSingleSource (G(V, E; w), s) // abbreviated below as initialize


1.
{ for each v V
2.
{ d[v] = ;
// minimal weights come here
3.
p[v] =NULL; } // potential parent node.
4.
d[s] = 0; p[s] = s; // the source is special
5.
}
Relax (u, v, w)
// A historical name; not descriptive
6.
{ if (d[v] > d[u] + w(u, v))
7.
{ d[v] = d[u] + w(u, v); // like Prims MST algorithm
8.
p[v] = u; }
9.
}

Dijkstra algorithm
Edsger Dijkstra invented & discovered many algorithms, but stated as above, the topic is understood
to be finding shortest paths in a graph.

10

CS5084 notes XII November 19, 2015

It is a greedy, fast algorithm, which requires, for correct results, that w(u, v) > 0 for all pairs of
nodes.

Dijkstra (G(V, E; w), s)


1. Initialize (G, s)
2.
S = ; // a set to collect the nodes with known minimal distances
3.
Q = V ; // a priority min-queue, arranged by d[ ] values
4.
while (Q 6= ) // satisfied |V | times
5.
{ u = min(Q); // dequeue the closest node
6.
S = S {u};
()
7.
for each node v adjacent to u
relax (u, v, w); // requires updating Q if d values change
8.
9.
}
Claim The algorithm Dijkstra calculates correctly the minimal distances from s to all nodes in the
graph.
Note: The literature usually requires G to be directed. But the proof does not. The proof has it!
Let us see one use of the algorithm; each node name appears with the current value in the vector d.
b|

a|0
5

g|

f | 1

i|

6 3

e|

h|
1

d |

5
2

g|5

c|

f |9 1
6

6
5

i|
3

e|

h|
1

d |

5
2

f |9 1
6

g|5

c|6

b|2

a|0

b|2

a|0

i|7

e|

h |10

c|6
5
2

1
1

d |8

The algorithm is called with a given as a source (the order in which nodes are searched here only
matters when they have equal d[ ] values). The first diagram shows the state of the graph as line 4
is entered for the first time. Following that, a is removed from Q, and it is relaxed in line 7, leading
to changes in the d[ ] values for all the neighbors of a. Then b is dequeued, and the second diagram
shows the state of the algorithm after it is relaxed. The set S consists of the nodes in the shaded
area, and Q can be inferred from it, but is not represented explicitly.
The next one to be dequeued is g, relaxing d[h] to 10, and d[i] to 7, so that c is the next in line,

CS5084 notes XII Micha Hofri

11

updating the values of d[d] to 8, reaching the state shown in the rightmost diagram. The next to be
dequeued is i, and it will relax f to 8, and remove the last in Q, relaxing e to 10 (it will be later
relaxed from d to 9, as will h).
We now proceed to prove the effectiveness of the algorithm.
Proof We use an invariant which holds following the execution of the starred line (6, above). The
invariant has two parts, book-keeping and optimality claim:
1. Q = V S.
2. d[v] = (s, v), p[v] S, v S.
Initially S = , Q = V (though in a particular structure), and the invariant holds in an empty sense;
and when line 6 is reached u = s is the dequeued element, and it satisfies both parts of the invariant.
Induction (maintenance) step: Part 1 remains true, since we insert into S each node as it is removed
from Q. For part 2, we observe that by the invariant, the distances from s to all nodes in S is optimal,
so the relaxation will not change them, so part 2 holds for all of S except possibly the node u which
was just dequeued and inserted into S. We need to show that this must be the case for it as well.
Note that the path from s to u is entirely within the
previous nodes of S (the shaded area) except the last
edge, from x = p[u] to u. The parent x is in S since
u got that parent pointer created just as it joined S.
y
And while other nodes in S may have edges to u,
z
s
t
such as r, the pointer p[u] identifies that which offered the shortest path from s. Could there be a
r
shorter path using nodes not yet in S, such as t?
x
No, since we know now, that u was dequeued, not
u
t, hence d[u] 6 d[t], and the path length p(t, u) > 0
as well. If both of the inequalities are realized as
equality, then it does not matter, really, since still
the path through x is optimal, though not unique.
Note: nodes join S in monotonically increasing distances from s.
The algorithm is likely to remind you of Prims MST algorithm, so that differentiating the two would
hopefully reinforce understanding their principles of operation.
What is the cost? Let the graph have n nodes and m edges. Initialization is in (n), and then n
searches for the minimally distant node u to dequeue.

12

CS5084 notes XII November 19, 2015

This can increase as fast as n2 operations with a nave implementation, or the more restrained n log n
using a heap as the priority queue. Then the relaxations, there are 2m of them, and in the worst case,
if all, or most cause a change in d values (each such change may require an update of Q), their cost
is in (m log n), and this determines the total cost, O(m log n).
The text mentions more modest costs if we (a) use more elaborate structures for the priority queue
(Fibonacci heaps, or binomial heaps), and (b) average over the cost of the worst case sequences of
events (what they call amortized cost). The elaborate heap structures allow us to keep the growth
rate of the cost to O(n log n), for moderately dense graphs, a term reserved for graphs where m
o(n2 / log n), but unless the graphs are truly huge any savings are more than offset by the higher
constants involved.
Looking back at the algorithms due to Prim and Dijkstra they look similar: start in a single node
and raise from it a spanning tree. Both algorithms define and maintain a figure of merit for each
node not yet in the tree, and at each stage picks the smallest one to join the tree, and the length of
the edge connecting the node to the tree plays a role in that figure of merit, d[v]. . . . Indeed, it is only
the way that d[v] is calculated that distinguishes between them. Moreover: they would occasionally
produce the same spanning tree! But not always, and Prims does not tell you about the distances on
that tree. Here is however a small example where the algorithms would produce different spanning
trees, and we used again the graph K4 :
3

Clearly, Prims algorithm picks the MST with the lightest edges:

1
d

but for shortest distances this is not optimal at all; in fact, my choice of edge-weights allows for
several alternatives; here is one: 2
s

Note that this one does not use the lightest edge!

c
2015,
Micha Hofri.

You might also like