Tremblay Chap 19

Data Structures and Software Development in an Object-Oriented Domain Java Edition Jean-Paul Tremblay and Grant A.
Cheston University of Saskatchewan

c copyright 2008 J.P. Tremblay and G.A. Cheston
Chapter 19
Graphs
Contents
19.1 19.2 19.3 19.4 19.5 Introduction and Examples of Graph Modeling . . . . . . . Basic Denitions of Graph Theory . . . . . . . . . . . . . . . Graph ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paths, Reachability, and Connectedness . . . . . . . . . . . . Graph Representations . . . . . . . . . . . . . . . . . . . . . . 19.5.1 Adjacency Matrix Representation . . . . . . . . . . . . . . . 19.5.2 Adjacency Lists Representation . . . . . . . . . . . . . . . . 19.5.3 Searchable Graph . . . . . . . . . . . . . . . . . . . . . . . . Computing Paths from a Matrix Representation of Graphs 19.6.1 Computing Reachability, Using Matrix Multiplications . . . 19.6.2 Ecient Reachability Algorithm . . . . . . . . . . . . . . . . 19.6.3 All Pairs Shortest Paths Algorithm . . . . . . . . . . . . . . 19.6.4 Single Source Shortest Paths Algorithm . . . . . . . . . . . . Traversals of Undirected Graphs . . . . . . . . . . . . . . . . 19.7.1 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . 19.7.2 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8.1 Connectivity and Components . . . . . . . . . . . . . . . . . 19.8.2 Spanning Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8.3 Topological Sorting . . . . . . . . . . . . . . . . . . . . . . . 19.8.4 Scheduling Networks . . . . . . . . . . . . . . . . . . . . . . . 19.8.5 Graphs in Testing . . . . . . . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 842 849 855 861 865 879 886 887 895 895 897 900 903 909 909 914 921 922 924 928 932 938 944
19.6
19.7
19.8
19.9
The data structure introduced in this chapter, the graph, is a generalization of the tree data structure. It is similar to a tree in that a graph consists of a set of nodes and a set of lines with each line joining a pair of nodes. In a tree, the lines are used to dene a hierarchical structure (i.e., a node has one parent and may have several children). This is not true in a graph where a node is simply joined to a number of other nodes by lines. Figure 19.1 shows two graphs. In the left graph, there is a direction associated with each line, as indicated by a directed arrowhead; such a graph is called directed. In the right graph, the lines have no direction, and the graph is called undirected. Usually, a graph is either directed or undirected, although sometimes a graph has both directed and undirected lines. 841
842
Graphs
2 b 1 4
a d c
Figure 19.1. Examples of directed and undirected graphs Graphs have been found to be useful in a wide variety of application areas. To some extent, each area has developed its own notation. For example, sometimes the nodes are called vertices or points. Also, the lines can be called arcs or edges. Often, we say that an undirected graph consists of vertices and edges and a directed graph consists of nodes and arcs. Either or both of the vertices and edges may be labeled with letters, numbers, or some other identication. Section 19.1 will present the use of graphs in a number of applications. In discussing these applications, a number of graph terms will be informally introduced. Sections 19.2 and 19.3 will present more formal views of graphs, rst through more precise denitions of graph terms and through an Abstract Data Type (ADT) specication. Many interesting properties of graphs are related to paths in graphs. Section 19.4 presents some basic path concepts. There are two standard representations for a graph within a program; these will be described in Section 19.5. The next two sections present a number of graph algorithms. Section 19.6 develops algorithms for computing paths in a graph. Section 19.7 presents the two standard traversals of a graph: breadth rst and depth rst. Finally, in Section 19.8, the details and algorithms are presented for a number of graph applications, including connected components, network scheduling, and software testing using graphs.
19.1
Introduction and Examples of Graph Modeling

Graphs are used widely to model problems in many dierent application areas. Constructing a model is essentially a process of deciding what features or aspects of a real-world problem or application are to be represented for the current analysis or study. The model of an application varies widely depending on the view or purpose that a modeler has for the application. Good models capture the essence of the real world that is relevant to the current problem and ignore the details irrelevant to this problem. Moreover, good models are robust; that is, they have the ability to remain relevant as the applications evolve. This section outlines several problems to which graph theory has been applied successfully. Although many problems could have been chosen, the ones presented in this section illustrate the diversity of application areas that can be modeled with graph structures. One of the objectives of this section is to get an intuitive idea of a graph by looking at the diagrams of several graphs. A glance at the next few pages shows several graphs. Although several of the graphs have somewhat dierent structures, note that every diagram consists of a set of vertices
Sec. 19.1. Introduction and Examples of Graph Modeling
843
Prince George
806
638
Edmonton
1245 292 612
597 526
Prince Albert
160
Saskatoon
765 258
843 578
Calgary Vancouver
150 1056
Winnipeg
Regina
Victoria
Figure 19.2. A graph of major highway arteries in Western Canada (nodes, points). These are shown as circles, ovals, small dots, or other icons. The vertices are sometimes unlabeled, whereas others are labeled by letters, numbers, or other strings. Also, in every diagram certain pairs of vertices are joined by lines. These lines are more often called edges or arcs. The other details, such as the positions of the vertices, the shape or length of the edges, and so on, are of no importance at present. Notice that every edge starts at one vertex and ends at another vertex. If the edge has a specic direction, as indicated by an arrowhead, it is called directed. Otherwise, it is called undirected. A formal denition of the graph, which is essentially an abstract mathematical system, will be given in the next section. The preceding informal denition of a graph contains no reference to the length, shape, and positioning of the edge joining any pair of vertices, nor does it prescribe any ordering of the positions of the vertices. Therefore, for a given graph, there is no unique diagram which represents the graph, and it can happen that two diagrams that look entirely dierent may represent the same graph.
Example 19.1 One of the most frequent uses of a graph occurs when you plan a vacation. If travel is by car, road maps represent the possible roadway systems that can be used for the trip. A road map is actually a model of the main highways for travel purposes. Also, a road map is a graph where the vertices are the towns and cities in some geographical area, and the edges represent the roads joining these cities or towns. Figure 19.2 shows a graph of the main highway arteries joining the major cities in Western Canada. Each edge or road in the graph allows travel in either direction, so the edges are a number associated with each edge denotes the distance in undirected. The label kilometers between the two cities. When planning a trip, a traveler may be interested in the distance between two cities by using some particular route (e.g., between Winnipeg and Victoria via Edmonton). Because of time constraints, a traveler may also be interested in the minimum distance between two given cities. Example 19.2 The computer simulation of trac systems is a tool frequently used by transportation and city planners. The systems modeled range from the trac network across a nation, a city, an area of a city, or even the trac ow on one bridge or in one intersection. The models are used to pinpoint present or future bottlenecks and to suggest and test proposed changes. In a city, the street system can be modeled as a graph where the intersections are represented as vertices, and the street segments between intersections are the edges. For example, see the street system of Figure 19.3a and the graph of Figure 19.3b. Two-way streets are represented as undirected edges (i.e., edges without an arrowhead, whereas one-way streets are directed edges). One of the applications of such a graph includes the determination of the shortest
844
Graphs
(a)
(b)
Figure 19.3. A graphic representation of a city street system

path that the police and re departments would use to answer a 911 call. Note that the edges could be labeled with street names, trac densities, and so on. Example 19.3 A more recent use of graphs has been in the modeling of computer networks. A computer network typically consists of a variety of elements such as computers and communication lines. In a graph representation of a computer network, each vertex is a device such as a computer or terminal, and each edge or link denotes a communication medium such as a telephone line or communication cable. Many industrial rms and universities have one or more local area networks that typically cover an area less than one square kilometer. In addition, there are long-haul or wide-area networks whose vertices, from a geographic viewpoint, can be located in one or more countries. Graphs are important in modeling these networks with respect to reliability and eciency. A graph representation of a computer network appears in Figure 19.4. The subnet part of the network represents the communication part of the network. The devices around the communications subnet can be viewed as external devices. Although we have taken the liberty of representing terminals and personal computers by icons, it should be emphasized that in an abstract graph these components would be simple nodes. The arrangement of subnet vertices (nodes) and links in the subnet is not arbitrary. In a local area network, the subnet nodes are often arranged as a ring or star structure (see Figure 19.5). In wide-area networks, subnet nodes can be far apart and arranged in somewhat arbitrary ways. Example 19.4 Most software products consist of modules that invoke each other. From an objectoriented perspective, a module might be a method or a whole class. A call graph represents modules by nodes. A directed line from a node x to a node y indicates that x invokes y . In the call graph of Figure 19.6, for example, module A invokes modules B , C , and D. Modules B and C invoke module E . When one module invokes another, there must be communication between the two modules through an interface. An interface usually consists of a list of parameters. Some researchers have attempted to use graphs to evaluate the overall quality of a software system by modeling the system with an extended call graph that shows the module interfaces. Example 19.5 There are several applications in computer science where it is convenient to model, or represent, an algorithm by a graph. An example of such an application arises in the context of generating test cases for an algorithm. These tests are derived by analyzing the structure of the algorithm with respect to control ow. The control ow is modeled by a ow graph. Each vertex in a ow graph represents one or more statements or instructions that do not involve branching. Thus, a sequence of statements preceding a conditional statement is mapped into a single vertex. The arcs (directed edges) in the ow graph represent the ow of control. Any module specied in a procedural or object-oriented language can be translated into a ow graph. For example, Figure 19.7 shows the ow-graph representations of certain familiar constructs that are available in many languages. The label T on an edge denotes a true branch, whereas F denotes a false branch.
845
Personal computer
Terminal
Subnet
CPU
Figure 19.4. Graph representation of a computer network
(a) Ring arrangement
(b) Star arrangement
Figure 19.5. Specialized node arrangements in local area networks

As an example, Figure 19.8b shows the ow graph for the skeleton of a module in Figure 19.8a. Note that vertex 11 was created to represent statements after the outer , although there are not any of them. The statements s1 through s9 in Figure 19.8a are assumed to be noncontrol statements, such as assignment statements. One testing approach is to use the ow graph of a module to generate a set of independent paths that must be executed in order to ensure that every statement and branch has been executed at least once, which will be discussed in Section 19.8.5 later in this chapter. Example 19.6 One of the rst computerized applications of graphs was concerned with project scheduling. The objective of scheduling a project is to schedule the completion of a number
846
Graphs
Figure 19.6. A graph representation of a software system
Sequence
If-then-else
T F
While
T F
Switch
Figure 19.7. Flow graph notation for various constructs

of subtasks, called activities, so that the overall project can be completed as soon as possible. This is complicated by timing interrelationships between the activities. In most cases, the work on an activity can proceed independently of the progress of the work on many of the other activities. However, usually the work on an activity cannot be started until certain specic activities are completed. These timing considerations need to be analyzed to complete the project in the minimum time. The projects involved might be as simple as the erection of a house, but tend to be very complex, such as the design and construction of a power dam or the space race to be rst on the moon. The techniques here were instrumental in the United States of America, being able to put men on the moon as soon as they were. To illustrate project management concepts, we will rst consider scheduling a moderate-sized software project. Rather than a specic project, the example given will be for an articial project. It will be assumed to be a project with three subsystems, as well as a signicant user interface and persistent storage interface. It is also assumed that the total project will require about 8 months in personnel time. The actual time from start to end will be less than this as several people may be working on the project at the same time. Two of the objectives of doing the scheduling are to determine what needs to be done in parallel and where the bottlenecks are that need to be completed in the minimum time. The completion of a large project involves breaking the project into a set of interrelated activities or tasks. For our software project, the resulting activities are shown in the second column of Table 19.1. The project begins with the activity to identify the objectives (i.e., a general specication of what the project is to involve). Early in the project, the personnel who will use the software product must be consulted. Thus, interviews are scheduled with users and managers who will be involved with the operation of the system. Obviously, general objectives of the project must be identied before the interviews. As a result, in the Preceding Activities column of Table 19.1, activity a1 , identify the project, is given as a preceding activity for activities a2 and a3 . The fourth activity shows writing an initial draft of the requirements for the project. This activity has two preceding activities those
847
1 s1 2 s2 while (!finished) 3 T 4 s3 F
void whatever() { s1 s2 while (!finished) { s3 if (flag1 == 0) { s4 s5 } else { if (flag2 == 0) s6 else s7 s8 } } s9 } (a)
5 flag1 == 0 F T T 6 s4, s5 s6 8 7 flag2 == 0 F 9 s7
10 s8
11
end of loop statement

12 s9 end of procedure
(b)
Figure 19.8. Modeling a module with a ow graph

involving interviews with users and managers. The remaining activities in the table mostly follow the steps for developing a software product and so should be easy to follow. Of course, to schedule the timing of the project, time estimates are needed. Thus, the fourth column in Table 19.1 contains an estimate in days for the duration of each activity. The creation of a table for a project, such as that given in Table 19.1, can be a nontrivial task for medium-sized projects. However, for large projects, the task can be formidable. The identication of detailed activities and the associated precedence requirements are based on experience, art, and the current practice in the industry. However, the extent to which scheduling is useful depends upon modeling the project by identifying meaningful activities and precedence requirements, and accurately estimating time duration to complete each activity. A graph with directed edges is a natural way of describing, representing, and analyzing the timing interrelationships between the activities. The scheduling of a project can be represented by a simple directed graph called a scheduling network, in which the activities become the edges of the graph. The nodes of the graph are used to coordinate the timing of activities. Thus, a node called an event in scheduling terminology represents the state of the project at a certain time. More specically, an event represents the requirement that certain activities (the incoming edges) must be completed before other activities (the outgoing
848
Graphs
Table 19.1. Software Project Activities

Edge a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26 a27 a28 a29 a30 a31 a32 a33 a34 a35 a36 a37 a38 a39 a40 Activity Identify the project Initially interview users Initially interview managers Draft initial requirements Identify system boundary and main use cases Review draft of requirements with users Review draft of requirements with managers Consult with external experts Generate requirements document Identify further use cases Identify main objects and classes Design system testing Develop sequence diagrams and partial class diagrams for main use cases Develop subsystems Develop basic system architecture Develop sequence diagrams and partial class diagrams for subsystem 1 Develop sequence diagrams and partial class diagrams for subsystem 2 Develop sequence diagrams and partial class diagrams for subsystem 3 Rene subsystem organization and system architecture Perform detailed design of users interfaces Implement users interfaces Test users interfaces Evaluate and rene interfaces Design persistent storage interfaces Implement persistent storage interfaces Test persistent storage interfaces Subsystem 1 - design - implementation - testing Subsystem 2 - design - implementation - testing Subsystem 3 - design - implementation - testing Systems integration Integration testing Review for quality Restructure and revise system for quality System testing Preceding Activities {a1 } {a1 } {a2 , a3 } {a4 } {a5 } {a5 } {a5 } {a6 , a7 , a8 } {a9 } {a10 } {a9 } {a11 } {a13 } {a14 } {a15 } {a15 } {a15 } {a16 , a17 , a18 } {a19 } {a20 } {a21 } {a22 } {a19 } {a24 } {a25 } {a19 } {a27 } {a28 } {a19 } {a30 } {a31 } {a19 } {a33 } {a34 } {a23 , a29 , a32 , a35 , a26 } {a36 } {a37 } {a38 } {a12 , a39 } Estimated Duration 2 3 2 4 4 3 2 3 10 3 5 5 8 2 2 4 6 5 3 5 15 6 6 4 10 4 4 10 3 8 15 6 6 12 4 6 4 6 8 10
Sec. 19.2. Basic Denitions of Graph Theory
849
edges) can be started. The graph obtained for Table 19.1 is given in Figure 19.9. The activity a1 has no preceding activities, so it can be started at any time. This is represented by node 1 with no incoming edges and an edge for a1 outgoing. Activities a2 and a3 require that a1 has been completed; hence, their corresponding edges start at node 2. Moving on to consider activity a4 , it requires that a2 and a3 have been completed. If a2 and a3 were to have the same ending node, this would result in two edges between a pair of nodes. Such edges are called parallel and create complications. To avoid these complications, a2 is directed into node 3 and a3 directed into node 4, and two articial activities are created from node 3 to node 5 and from node 4 to node 5. In the graph, these articial activities are given labels d1 and d2 to reect that they are articial or dummy activities. Each dummy activity is given a duration of zero to reect that it is only used for synchronization and does not require any time. Similarly, the rest of the events and activities are created as necessary and added to the graph. Dummy activities are needed to avoid having the edges for activities a6 , a7 , and a8 become parallel edges, and also for activities a16 , a17 , and a18 . For convenience, the events are numbered sequentially. Finally, note that each edge is labeled with a number that represents the duration of the activity. The articial activities are given duration zero, since they are not real activities that require time to complete. In a directed graph, a vertex with no incoming edges is called a source, and a vertex with no outgoing edges is called a sink. In a scheduling graph for a project, there is exactly one source and exactly one sink. An important use of the graph of a project is to determine the earliest completion time for the project. This time is determined by a sequence of activities such that the activities must be done sequentially, and the total duration of the sequence is at least as long as any other such sequence. Such a sequence is called a critical path. It is a path (sequence of edges) from the source to the sink such that if any activity on the path is delayed by an amount t, the entire project is delayed by t. A critical path for the graph in Figure 19.9 is shown with double lines. It consists of 22 nonzero activities with an earliest completion time of 120 days. In fact, there are two critical paths in this network, as activity a8 can be replaced by activity a6 and still have a path of longest duration. A delay in the completion of any activity on any critical path will result in a delay in completing the project. If the objective is to reduce the earliest completion time for the project, it is necessary to nd and speed up at least one activity on every critical path. Since in practice for large graphs the number of activities that lie on a critical path is a small percentage of the total number of activities, say, 10%, only that 10% needs to be considered for speeding up the project. Of course, if the current critical paths are shortened too much, some other path will become critical, and it too must be shortened to make the earliest completion time any sooner.
19.2
Basic Denitions of Graph Theory

The application of graph theory to many elds has resulted in a great diversity in terminology. Dierent authors often use dierent terms for the same concept, and, what is worse, the same term for dierent concepts. We will use computer science notation, and, wherever possible, we shall indicate the common alternatives used in the literature of computer science. In this section, we shall dene a graph as an abstract mathematical system. However, to provide some explanation for the terminology used and also to develop some intuition for the concept of a graph, we shall represent graphs diagrammatically. Any such diagram will also be called a graph. Alternative methods of representing graphs, suitable for computer representation, will be discussed in Section 19.5.
850
3
a2 a1
3 0
8
d1 a4
0
a6
3
0 0 0
d3 a9 a10 a11 a13 a14
a5
a7
d4
2
a3
5
d2
7
a8
2 3
11
d5
10
12
13
14
15
16
10
a15
2 5
a12
23 a21
15 5
24 a22
6
25
a23
a20 a16
4 0
d6 a19
a27
4
28 a28
10
29
a29
3
a17
d7
a30
a31
15
a32
a36
a37
a38
a39
a40
17
a18
19
0 0
21
22
a24
a33 30
6
31 33 27
a35
4
34
35
36
37
38
10
39
d8
a34
32 26
12
a26
4
a25
10
Graphs
Figure 19.9. A scheduling network for a moderate-sized software project
851
1 e1 2 e2 e3 3 2 e1
1 e3 e2 3
(a) Directed graph with edge labels
(b) Undirected graph with edge labels
(c) Directed graph without edge labels

Figure 19.10. Pictorial representations of graphs A graph consists of vertices joined by edges. A mathematical denition of a graph must therefore rely on V , the set of vertices, and E , the set of edges. Each edge is associated with two vertices that is, there is a mapping from an edge to the ordered or unordered pair of vertices. This is summarized in the following denition: Denition 19.1: A graph G = (V, E, f ) consists of a nonempty set V , the set of vertices (nodes, points) of the graph, a set E , the set of edges of the graph, and a mapping f from the set of edges E to a set of ordered or unordered pairs of elements of V . If an edge is mapped to an ordered pair, it is called a directed edge; otherwise, it is called an undirected edge. Notice that the denition of a graph implies that with every edge of the graph G, we can associate an ordered or unordered pair of vertices of the graph. If an edge e E is associated with an ordered pair (u, v ) or unordered pair {u, v }, where u, v V , then we say that the edge e joins the vertices u and v . Any two vertices joined by an edge in a graph are called adjacent vertices. We shall assume throughout the discussion that both the sets V and E are nite. Usually, it will be convenient to write a graph G as (V, E ), or simply as G. In the former case, each edge is directly represented as the pair that it is mapped to, which eliminates the need to specify f . A graph in which every edge is directed is called a directed graph, or digraph. A graph in which every edge is undirected is called an undirected graph. If some edges are directed and some are undirected in a graph, the graph is called mixed. In the diagrams, a directed edge (arc) is shown as a line with an arrowhead that shows the direction. The graphs given in Figure 19.10a and Figure 19.10c are directed graphs. The one given in Figure 19.10b is undirected. Notice that the edges e1 , e2 , and e3 in Figure 19.10a are associated with the ordered pairs (1, 2), (2, 3), and (3, 1), respectively. The set representation of the graph in Figure 19.10a is ({1, 2, 3}, {(1, 2), (2, 3), (3, 1)}). The edges e1 , e2 , and e3 in Figure 19.10b are associated with the unordered pairs {1, 2}, {2, 3}, and {3, 1}, respectively. In Figure 19.10b, vertex 1 is adjacent to vertices 2 and 3. Let G = (V, E ) be a graph and let e E be a directed edge associated with the ordered pair of nodes (u, v ). The edge e is then said to initiate or originate at the node u and
852
Graphs
1 2
v1
v2
1 2
v3 (b) Directed multigraph
3 (c) Directed graph with a loop
(a) Undirected multigraph with loops
Figure 19.11. Multigraphs and loops terminate or end at the node v . The nodes u and v are also called the initial and terminal nodes of the edge e. An edge e E , which joins the nodes u and v , whether it be directed or undirected, is said to be incident to the nodes u and v . An edge of a graph that joins a node to itself is called a self-loop or sling, not to be confused with a loop in a program. The direction of a loop is of no signicance. Hence, it can be considered either a directed or undirected edge. Some authors do not allow any loops in the denition of a graph. The graphs given in Figure 19.10 have no more than one edge between any pair of nodes. In the case of directed edges, the two possible edges between a pair of nodes that are opposite in direction are considered distinct. In some graphs, directed or undirected, we may have certain pairs of nodes joined by more than one edge, as shown in Figure 19.11a and Figure 19.11b. Such edges are called parallel. Note that there are no parallel edges in the graph of Figure 19.11c. In Figure 19.11a, there are two parallel edges joining nodes 1 and 2, two parallel edges joining nodes 2 and 3, and two parallel loops at node 2. In Figure 19.11b, there are two parallel edges associated with the ordered pair (v1 , v2 ). Any graph that contains some parallel edges is called a multigraph. In this case, the mapping between edge and node pairs is not one-to-one. The shorthand notation G = (V, E ) is not sucient for representing a multigraph, and the full notation G = (V, E, f ) is needed. However, if there is no more than one edge between a pair of nodes (no more than one edge with a specic direction in the case of a directed graph), such a graph is called a simple graph. In this chapter, we deal primarily with simple graphs. The example graphs given in Section 19.1 are all simple graphs. We may have graphs in which numeric labels, called weights, are placed on the edges. For example, a graph representing a system of pipelines might have a weight associated with each edge (pipe), which indicates the amount of some commodity that can be transferred through the pipe. Similarly, a graph of city streets may be assigned weights according to the trac density on each street. A graph in which weights are assigned to every edge is called a weighted graph. In a graph, a node that is not adjacent to any other node is called an isolated node. A graph containing only isolated nodes is called a null graph. In other words, the set of edges in a null graph is empty. Denition 19.2: In a directed graph, for any node v , the number of edges that have v as their initial node is called the outdegree of node v . The number of edges that have v as their terminal node is called the indegree of v , and the sum of the outdegree and
853
(a)
(b)
Figure 19.12. A graph and one of its subgraphs the indegree of a node v is called its total degree. In the case of an undirected graph, the total degree or just the degree of node v is equal to the number of edges incident with v . The total degree of an isolated node is 0 and that of a node with a loop and no other edges incident to it is 2. A simple result involving the notion of the degree of nodes of a graph is that the sum of the degrees (or total degrees in the case of a directed graph) of all the nodes of a graph must be an even number that is equal to twice the number of edges in the graph. Some graph applications are concerned with only parts of a graph. The notion of a subset of a set is useful in formalizing what we mean by a part of a graph. Denition 19.3: Let U be the set of nodes of a graph H and V be the set of nodes of a graph G such that U V . If, in addition, every edge of H is also an edge of G, the graph H is called a subgraph of the graph G, which is expressed as H G. Naturally, the graph G and the null graph obtained from G by deleting all the edges of G are also subgraphs of G. Other subgraphs of G can be obtained by deleting certain nodes and edges of G. In Figure 19.12, the graph in part (b) is a subgraph of the graph in part (a). Let G = (V, E ) be a graph and X V . A subgraph whose nodes are given by the set X and whose edges consist of all those edges of G that have their initial and terminal nodes in X is called the subgraph induced by X . Thus, the subgraph in part (b) of Figure 19.12 is not an induced subgraph, but if the edge from the upper left node to the lower right node is included, the subgraph would be an induced subgraph of the graph in part (a). There are several special classes of simple graphs that frequently arise. We now briey describe a few of these. A graph (V, E ) is said to be complete if every node is adjacent to all other nodes in the graph. A complete graph of n nodes is denoted by Kn . Figure 19.13 shows the rst ve complete graphs. Another type of simple graph is a bipartite graph. A simple graph G = (V, E ) is called a bipartite graph if V can be partitioned into subsets V1 and V2 such that no two nodes of V1 are adjacent and no two nodes of V2 are adjacent. Consequently, an edge cannot join two nodes in V1 or two nodes in V2 . The graph in Figure 19.14a is a bipartite graph, since the two disjoint subsets of nodes are V1 = {v1 , v2 } and V2 = {v3 , v4 , v5 }, and all the edges only join nodes in V1 to nodes in V2 . However, Figure 19.14b is not a bipartite graph, since the nodes cannot be partitioned into two nonempty disjoint subsets where edges only join a node from one subset to a node in the other subset. Note that the theory of binary relations is closely linked to the theory of simple digraphs. Let G = (V, E ) be a simple digraph. Every edge of E can be expressed by means of an
854
Graphs
K1
K2
K3
K4
K5
Figure 19.13. Examples of complete graphs
v1
v1 v3 v4
v2 (a)
v5
v2 (b)
v3
Figure 19.14. Bipartite and nonbipartite graphs ordered pair of elements of V that is, E V V . However, any subset of V V denes a relation in V . Accordingly, E is a binary relation in V whose graph is the same as the simple digraph G.
Problems 19.2
1. Show that the sum of indegrees of all the nodes of a simple digraph is equal to the sum of the outdegrees of all its nodes and that this sum is equal to the number of edges of the graph. 2. Because graphs can be drawn in an arbitrary manner, it can happen that two diagrams which look entirely dierent from one another may represent the same graph. See, for example, parts (a) and (a) of Figure 19.15. Two graphs are isomorphic if there exists a one-to-one correspondence between the nodes of the two graphs that preserves adjacency of the nodes, as well as the directions of the edges, if any. According to the denition of isomorphism, we note that any two nodes in one graph which are joined by an edge must have the corresponding nodes in the other graph also joined by an edge, and, hence, a one-to-one correspondence exists between the edges as well. The graphs given in parts (a) and (a) of Figure 19.15 are isomorphic because of the existence of a mapping 1 u1 , 2 u3 , 3 u4 , and 4 u2 Under this mapping, the edges (1, 3), (1, 2), (2, 4), and (3, 4) are mapped into (u1 , u4 ), (u1 , u3 ), (u3 , u2 ), and (u4 , u2 ), which are the only edges of the graph in Figure 19.15a. Show that the digraphs given in parts (b) and (b) of Figure 19.15 are isomorphic. 3. Draw all possible simple digraphs having four nodes. Show that there is only one digraph with no edges, one with one edge, two with two edges, three with three edges, two with four edges, one with ve edges, and one with six edges. Assume that there are no loops and that isomorphic graphs are not distinguishable. 4. Show that the graphs given in Figure 19.16a and Figure 19.16b are isomorphic.
Sec. 19.3. Graph ADT
855
4 u1 2 3 u2 (a') u5 v4 v5 v1 (b) v2 u1 (b') u2 v3 u4 u3 u4 u3
1 (a)
Figure 19.15. Pairs of isomorphic graphs
v1 u5 u4 v2 u6 u3
v6
v5
u1 (a)
u2
v3 (b)
v4
Figure 19.16. Isomorphic graphs

5. Show that the digraphs in Figure 19.17 are not isomorphic. 6. Show that a complete digraph with n nodes has the maximum number of edges, n(n 1), assuming that there are no loops.
19.3
Graph ADT
The reader should have an intuitive feeling for the concept of a graph from the examples of graphs seen so far. Now, consider the development of an ADT for a graph. Our specication of the ADT will not be a formal specication (i.e., neither axiomatic nor constructive). Instead, we will use informal verbal descriptions for the Java specication of features of a graph as dened in the Uos data structure library.
856
Graphs
u1 u6 u7
u2
v1 v6 v5 v7
v2
u5 u4
u8 u3 v4
v8 v3
Figure 19.17. Nonisomorphic graphs

ADT VertexUos VertexUos( id) // constructor for a vertex index() // index of the vertex String toString() // string representation of the vertex
Figure 19.18. VertexUos ADT The rst thing to note is that every graph involves vertices or nodes and edges. These are parts of a graph, but distinct from it, so they should have their own ADT. A vertex is an abstract entity that we draw as a small circle. To distinguish one from another, it is convenient for them to have distinct labels. Thus, each vertex will be given a unique index. The public constructor for a vertex has its index as a parameter. As the index is used to guarantee vertex uniqueness, there is no operation to change the index of a vertex. Finally, a String representation of the vertex (giving its index) is available via the toString() method. Thus, the features of the VertexUos ADT are those listed in Figure 19.18. An edge is identied by the two vertices at its ends. Rather than using the graph theory notation of initial and terminal vertices, they will be called firstItem and secondItem. This is in keeping with the approach of using the same names for the same concepts even in dierent contexts. Normally, the ends of one edge are distinct from the ends of every other edge, so the ends of an edge are sucient to distinguish it from another edge. Formally, there are two types of edges: directed and undirected. Rather than having two ADTs, a directed edge (u, v) will be represented by an instance of EdgeUos with u = firstItem and v = secondItem. An undirected edge {u, v} will be represented by two directed edges: (u, v) and (v, u). The public constructor for an edge has two vertices as parameters. No methods are provided to change the ends of an edge. Other features for the EdgeUos ADT are the method toString() to return the String representation of the edge, a method to determine whether a given vertex is an end of the current edge, and a method that given one end of an edge will return the vertex at the other end. Thus, for the EdgeUos ADT, we have the features given in Figure 19.19. Now, consider the GraphUos ADT. A graph is a container of vertices and edges. Hence, it makes sense to have a constructor to create an empty graph with no vertices or no edges and then methods to add vertices and edges. In addition, we will take the approach that vertices and edges only exist in a graph. As a result, instead of having a method to add a vertex, the method will create a vertex to match the specication provided by the parameters and store
857
ADT EdgeUos EdgeUos(VertexUos v1, VertexUos v2) // constructor for an edge VertexUos firstItem() // initial vertex of the edge VertexUos secondItem() // terminal vertex of the edge has(VertexUos v) // does the edge have vertex v as an end? VertexUos other(VertexUos v) // vertex other than v that is an end of the edge String toString() // string representation of the edge
Figure 19.19. EdgeUos ADT the vertex in the graph. Edges are handled in a similar way. The vertices of a graph are stored in an array so that an individual vertex is accessed by index. As a result, a capacity, which species an upper limit on the number of vertices in the graph, will be associated with the graph. This capacity will be specied when the graph is created. Normally, a graph has either all directed edges, and is called a directed graph, or all undirected edges, and is called an undirected graph. Whether a graph is to be directed or undirected will be specied when it is created. These key features of the GraphUos ADT are given in part 1 of Figure 19.20. These include the container features, insertion procedures, and size measurement features. An additional key feature is the function adjacent(), which tests whether two vertices are adjacent (i.e., whether there is an edge between them). Whereas the ADT features of part 1 are sucient for most applications, it is convenient to have a number of additional features. First, graphs can be very large. Thus, it can be clumsy and error prone to build a graph vertex by vertex and edge by edge. Therefore, there are two methods to read in a graph: read() for the interactive entry of a graph from the terminal, and fileRead() for the entry of a graph from a text le. The format of a graph as read by fileRead() is the same as that produced by toString(). Hence, it is easy to save a graph to a le for later retrieval. Also, it is convenient to have search methods, and a current vertex and a current edge that are set by the search methods. These additional features of the GraphUos ADT are in part 2 of Figure 19.20. Finally, it is useful to iterate through all vertices and through all edges incident to a specied vertex. Therefore, the features of part 3 of Figure 19.20 are also included. The procedures goFirst() and goForth() iterate through the vertices of the graph. When doing so, they set the instance variables accessible by itemExists(), item(), and itemIndex(). Iteration through the edges incident to a vertex is performed by eGoFirst() and eGoForth(), which modify the result obtained from the functions iterationIndex() (the index of the vertex being iterated), eItemExists(), eItem(), adjItem() (the adjacent vertex discovered in the iteration), and adjIndex() (the index of the adjacent vertex). Note that vertex and edge iteration are independent of each other. One does not aect the other, and the edge list being iterated is not necessarily the edge list of the current vertex. However, only one vertex iteration can be done at a time, since starting a second iteration will destroy the state (the internal storage of the present position in the iteration) of the rst iteration. For the same reason, only one edge iteration can be done at a time. A position in a graph records its current vertex, its current edge, and the status of the two iterators. Methods are provided to store the current position and move to a stored position. As an example of using the GraphUos constructors and members, Figure 19.21a has a sequence of operations to build and print a graph. For simplicity of presentation, the ADT for a graph was not dened to use generic types. However, the actual implementation that we will use will have two generic parameters. Figure 19.21a shows the specication of the generic parameters for the graph. The graph that is built is shown in Figure 19.21b,
858
Graphs
ADT GraphUos (part 1) GraphUos( cap) // make an empty undirected graph with vertex capacity cap GraphUos( cap, String graphChoice) /* Make an empty graph with vertex capacity cap, and either directed or undirected as specied by graphChoice. */ vNewIth( index) // insert a vertex with the specied index eNew(VertexUos v1, VertexUos v2) // insert an edge with ends v1 and v2 eNew( i, j) // insert an edge whose ends have indices i and j capacity() // the vertex capacity of the graph n() // current number of vertices m() // current number of edges directed() // is the graph directed? adjacent(VertexUos v1, VertexUos v2) // is there an edge from v1 to v2? isEmpty() // is the graph empty (no vertices)? isFull() // is the graph full (n = capacity)? wipeOut() // remove all vertices (and edges) String toString() // string representation of the graph (part 2) read() // read in a graph from the terminal leRead(File inle) // read in a graph from inle
goTo(VertexUos v) // move the vertex cursor to vertex v goIth( index) // move the vertex cursor to the vertex with index index itemExists() // is there a current vertex as specied by the vertex cursor?
VertexUos item() // the current item (vertex) itemIndex // index of the current vertex
EdgeUos eItem() // the current edge
eSearch(VertexUos v1, VertexUos v2) // move the edge cursor to the edge (v1, v2) eItemExists() // is there a current edge as specied by the edge cursor?
(part 3) goFirst() // go to the rst vertex goForth() // go to the next vertex after() // moved past the last vertex?
VertexUos adjItem() // vertex of eItem other than the one whose edge list is being iterated adjIndex() // the index of adjItem GraphPositionUos currentPosition() // the current position in the graph goPosition(GraphPositionUos pos) // go to the specied position in the graph
eGoFirst(VertexUos v) // go to the rst edge of v to start an edge list iteration eGoForth() // go to the next edge of the edge list eAfter() // moved past the end of the edge list? iterationIndex() // index of the vertex being edge iterated
Figure 19.20. GraphUos ADT along with the result of the toString() operation in Figure 19.21c. The rst line in the toString() output gives the number of vertices in the graph (i.e., 4) and whether the graph is directed ot undirected. Each subsequent line gives the index of a vertex and its associated list of adjacent vertices. Each adjacent vertex is specied by its index. A zero denotes the end of a list. The output is terse so that a graph can be written to a le using toString() and read later using the fileRead() procedure. As another example of using these operations, consider the construction of a special matrix called the adjacency matrix, A, of a graph. The adjacency matrix is dened as follows:
859
2
GraphLinkedRepUos<VertexUos, EdgeUos<VertexUos>> g = new GraphLinkedRepUos<VertexUos, g.vNewIth(1); g.vNewIth(2); g.goIth(1); VertexUos v = g.item(); g.goIth(2); g.eNew(v, g.item()); g.vNewIth(3); g.goIth(3); g.eNew(v, g.item()); g.eNew(g.item(), v); g.vNewIth(4); g.goIth(4); g.eNew(v, g.item()); System.out.println(g);
4 (b) 4 1: 2: 3: 4: directed 2340 0 10 0 (c)
a=
( (
0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 (d)
(a)
Figure 19.21. Graph built from a sequence of operations Denition 19.4: Let G = (V, E ) be a graph in which V = {v1 , v2 , . . . , vn } and the vertices are assumed to be ordered from v1 to vn . The n n matrix A whose elements aij are given by 1 if (vi , vj ) E aij = 0 otherwise is called the adjacency matrix of the graph G. Figure 19.21d gives the adjacency matrix a of the constructed graph. Note that aij indicates whether there is an edge directed from vi to vj . If there is an undirected edge between vi and vj , then aij = 1 and aji = 1. From the adjacency matrix by itself, it is not possible to determine whether aij = 1 and aji = 1 means one undirected edge or two directed edges. However, for most purposes, the distinction is not required. For a given graph, G = (V, E ), the adjacency matrix depends on the ordering of the elements of V . For dierent orderings of the elements of V , we get dierent adjacency matrices for the same graph G. However, any one of the adjacency matrices of G can be obtained from another adjacency matrix of the same graph by interchanging some of the rows and corresponding columns of the matrix. We shall neglect the arbitrariness introduced in an adjacency matrix owing to the ordering of the elements of V , and consider any adjacency matrix of a graph to be the adjacency matrix of the graph. The adjacency matrix a of a graph G is easily found using the graph ADT operations. First initialize the matrix to all 0s and then place a 1 in the location corresponding to each edge. For a given vertex, its edges are easily found by using the edge iterator. All the vertices are easily found by using the vertex iterator. By nesting these two iterators, all the edges can be found and the adjacency matrix a can be generated. The code for doing this is shown in Figure 19.22. The code uses the fact that Java initializes all the elements in an array to contain zeroes. Note that the iterators modify the state of the graph object so that the functions item(), iterationIndex(), and adjIndex() generally return dierent values after an iteration step. Rather than storing 1 directly in the appropriate location of a, the method uses the method matrixSetItem(). This is done because standard graph notation assumes that the vertices have indices 1 through n, as is standard in the eld of
860
Graphs
[ ][ ] computeAdjacencyMatrix(GraphUos<VertexUos, EdgeUos<VertexUos>> g)
{
[ ][ ] a = [g.capacity()][g.capacity()]; g.goFirst(); (!g.after()) { g.eGoFirst(g.item()); (!g.eAfter()) { matrixSetItem(a, 1, g.iterationIndex(), g.adjIndex()); g.eGoForth(); } g.goForth(); } a;
/** Set the item with indices i and j. Analysis: Time = O(1) */ matrixSetItem( [ ][ ] a, x, i, { /* Indices 1 through n are mapped to 0 through n - 1. */ a[i-1][j-1] = x; } /** The item with indices i and j. Analysis: Time = O(1) */ matrixItem( [ ][ ] a, i, j) { /* Indices 1 through n are mapped to 0 through n - 1. */ a[i-1][j-1]; }
j)
Figure 19.22. Code to compute the adjacency matrix of a graph graph theory, while Java starts the indices of an array at 0. Methods like matrixSetItem() and matrixItem() can be used to conne handling the mapping of 1 through n to 0 through n - 1 to two methods. The time to create and initialize the matrix is (n2 ). For the nested loops, the time depends on the eciency of the iteration operations. For any reasonable representation of the graph, the n vertices can be found in (n) time, and all the edges incident to a given vertex can also be found in O(n) time. Thus, (n2 ) also bounds the time for these loops and the whole algorithm.
Problems 19.3
1. Using the operations of the GraphUos ADT, give an algorithm to count the number of edges in a graph. Do not assume that this value is stored in attribute m. Note: If the graph is undirected, be careful not to obtain a value that is twice as large as it should be. 2. Using the operations of the GraphUos ADT, give an algorithm to determine the degree of each vertex. 3. Using the operations of the GraphUos ADT, give a function to test whether the graph is complete.
Sec. 19.4. Paths, Reachability, and Connectedness
861
19.4
Paths, Reachability, and Connectedness

In this section, we will introduce some additional terminology associated with a simple directed graph (digraph). During the course of our discussion, we shall also indicate how the same terminology and concepts can be extended to simple undirected graphs. In a graph, one of the key concepts is that of a path. Consider the digraph given in Figure 19.23. Some of the paths originating in node 1 and ending in node 12 are P1 P2 P3 P4 = = = = 1, 2, 3, 12 1, 2, 3, 4, 5, 6, 11, 3, 12 1, 2, 3, 4, 5, 7, 8, 10, 11, 3, 12 1, 2, 3, 4, 5, 7, 9, 10, 11, 3, 4, 5, 6, 11, 3, 12
where . . . is used to represent a sequence. Although this species that a path is a sequence of nodes, formally a path is a sequence of edges. Denition 19.5: Let G = (V, E ) be a simple digraph. A sequence of edges is called a path of G if and only if the terminal node of each edge in the path is the initial node of the next edge, if any, in the path. Thus, a path has the form (vi1 , vi2 ), (vi2 , vi3 ), . . . , (vik2 , vik1 ), (vik1 , vik ) , where it is assumed that all the nodes and edges appearing in the path are in V and E , respectively. As we have seen, it is customary to write such a path as the sequence vi1 , vi2 , . . . , vik1 , vik . Note that not all edges or nodes appearing in a path need be distinct. Also, for a given graph, any arbitrary set of nodes written in any order does not necessarily give a path. In fact, each node appearing in the path must be adjacent to the nodes appearing just before and after it, except in the case of the rst and last nodes. We now elaborate on this notion. A path is said to traverse through the nodes appearing in the sequence, originating at the initial node of the rst edge and ending at the terminal node of the last edge in the sequence. Denition 19.6: The number of edges appearing in the sequence of a path is called the length of the path. Note that this denition diers from the denition of the length of a path in a tree. In a tree, the length of a path is the number of nodes traversed by the path. The tree denition is used to reect the number of tests that must be done to nd a value in an ordered binary tree. For a graph, a path represents a route from one node to another node by means of direct links (edges). The normal measure of the length of a route is the sum of the lengths of the direct links. Hence, for a graph with no distances associated with its edges, the length of a path is the number of edges. Denition 19.7: A path in a digraph is called a simple path if all the edges are distinct. A path in which all the nodes are distinct is called an elementary path. As a special case, a path is still called elementary if the originating and ending nodes are the same.
862
1 s1 2 s2 while (!finished) 3 T 4 s3
Graphs
F void whatever() { s1 s2 while (!finished) { s3 if (flag1 == 0) { s4 s5 } else { if (flag2 == 0) s6 else s7 s8 } } s9 } (a) 5 flag1 == 0 F T T 6 s4, s5 s6 8 7 flag2 == 0 F 9 s7
10 s8
11
end of loop statement

12 s9 end of procedure
(b)
Figure 19.23. Modeling a module with a ow graph According to the denition, a path is called simple if no edge is repeated (edge simple), and a path is called elementary if no node is repeated (node simple), except that the rst and last nodes can be the same. Obviously, every elementary path of a digraph is also simple. The previous paths P1 , P2 , and P3 of the digraph in Figure 19.23 are all simple, but the paths P2 and P3 are not elementary, and P4 is not even simple. Denition 19.8: A path that originates and ends in the same node is called a cycle (circuit). A cycle is called simple if no edge in the cycle appears more than once in the path (i.e., the path is simple). A cycle is called elementary if the path is elementary. For an elementary cycle, the initial node appears twice (at the start and at the end), and the other nodes appear exactly once. The following are some of the cycles in the graph of Figure 19.23: C1 C2 C3 = = = 3, 4, 5, 6, 11, 3 3, 4, 5, 7, 8, 10, 11, 3 3, 4, 5, 6, 11, 3, 4, 5, 6, 11, 3
Sec. 19.4. Paths, Reachability, and Connectedness
863
Figure 19.24. Examples of acyclic graphs Observe that any path which is not elementary contains cycles traversing through those nodes that appear more than once in the path. By deleting such cycles, one can obtain an elementary path. For example, in the path C3 , if we delete the cycle (3, 4), (4, 5), (5, 6), (6, 11), (11, 3) , we obtain the path C1 , which also originates at 3, ends in 3, and is an elementary path. Some authors use the term path to mean only the elementary paths, and they likewise apply the notion of the length of a path only to elementary paths. The digraphs (directed graphs) generated by many applications never contain cycles. For instance, scheduling graphs never contain cycles. This type of graph has led to the following denition: Denition 19.9: A simple digraph that does not have any cycles is called acyclic. Naturally, acyclic graphs cannot have any self-loops. Examples of acyclic graphs are given in Figure 19.24. If there is a path from vertex u to vertex v , v is said to be reachable from u. Note that the concept of reachability is independent of the number of alternate paths from u to v and also of the lengths of the paths. For the graph of Figure 19.23, we have paths P1 and P3 from node 1 to node 12. Any one of these paths is sucient to establish the reachability of node 12 from node 1. If a node v is reachable from the node u, a path of minimum length from u to v is called a shortest path, or minimum-length path. The length of a shortest path from node u to node v is called the distance between them and is denoted by d(u, v ). Some properties of the distance function are d(u, v ) d(u, u) = d(u, v ) + d(v, w) 0 0 d(u, w)
The second equality follows from assuming that a trivial zero-length path exists from every node to itself. The last inequality is called the triangle inequality. If v is not reachable from u, then it is customary to write d(u, v ) = . Note also that if v is reachable from u, it does not imply that u is reachable from v . Moreover, even if each one can reach the other, d(u, v ) is not necessarily equal to d(v, u). Finally, in a simple digraph, the length of any elementary path is less than or equal to n 1, where n is the number of nodes in the graph. This follows, since there are only n distinct nodes and every node in an elementary path must be distinct. Let us now briey see how the concepts of path and cycle can be extended to undirected graphs:
864
Graphs
4 11
10
Figure 19.25. Example of a disconnected graph Denition 19.10: In a simple undirected graph, a sequence v1 , v2 , . . . , vd forms a path if for i = 2, 3, . . . , d there is an undirected edge {vi1 , vi }. The edge {vi1 , vi } is said to be on the path. The length of the path is given by the number of edges on the path, which in this case is d 1. If v1 = vd , the path forms a cycle. Recall that the denition of a path for a directed graph requires that the edges appearing in the sequence must have specic initial and terminal nodes. The terminal node of an edge must be the same as the initial node of the next edge. In the case of a simple undirected graph, an edge is given by an unordered pair, and any one of the nodes in the ordered pair can be considered as the initial or terminal node of the edge. To apply the denition of a path in a directed graph to an undirected graph, we can consider every edge in an undirected graph to be replaced by two directed edges in opposite directions. Once this replacement is done, the result is a directed graph, and the denitions of path, cycle, elementary path, simple path, and so on, are carried over to undirected graphs. Cycles in undirected graphs are slightly dierent from those in digraphs. For example, in an undirected graph, we do not consider the path v1 , v2 , v1 a simple cycle when {v1 , v2 } is an edge. Traversing an undirected edge in one direction and also in the other direction is considered to have traversed the same edge twice. More generally, we do not consider the traversal of a sequence of edges in a forward direction and then in the reverse direction a simple cycle. A simple cycle in an undirected graph is a simple path. We shall now introduce an important concept the connectedness of nodes in a graph. Denition 19.11: In an undirected graph, two nodes are said to be connected if the two nodes are reachable from one another (i.e., there is a path between them). Moreover, an undirected graph is said to be connected if every pair of nodes of the graph is connected. Although most graphs are connected, including the ones seen so far in this chapter, not all graphs are connected. For example, the graph in Figure 19.25 is not connected. It has four parts, called connected components, each of which is connected and disjoint from the others. The notion of connectedness cannot be applied to directed graphs without some further modications. This is because in a directed graph if a node u is reachable from another node v , the node v may not be reachable from u. As a result, there are three notions of connectedness in directed graphs. Two nodes are called strongly connected if each is reachable from the other. For example, in Figure 19.23, nodes 6 and 8 are strongly connected. Two nodes are called unilaterally connected if one of the nodes can reach the other. In Figure 19.23, node 1 is unilaterally connected to node 12, but they are not strongly connected. Note that two strongly connected nodes are automatically unilaterally connected, since each can reach the other. Finally, two nodes in a directed graph are called
Sec. 19.5. Graph Representations
865
v1
v4
v2
v3
Figure 19.26. A digraph for path analysis
v1
v4
v5
v2
v3
Figure 19.27. A digraph for degree analysis weakly connected if they would be connected in the undirected graph formed by converting each directed edge to an undirected edge. For all the digraphs given so far, every node is weakly connected to every other one in the digraph. These notions can be used to dene strongly connected components, unilaterally connected components, and weakly connected components in a directed graph. This will be pursued further in Section 19.8.1.
Problems 19.4
1. Give three dierent elementary paths from v1 to v3 for the digraph given in Figure 19.26. What is the shortest distance between v1 and v3 ? Is there a cycle in the graph? 2. Find all the indegrees and outdegrees of the nodes of the graph given in Figure 19.27. Give all the elementary cycles of this graph. Obtain an acyclic digraph by deleting one edge of the given digraph. List all the nodes that can reach every other node in the digraph. 3. For the graph of Figure 19.25, determine all the elementary paths from node 1 to node 8. Also, determine all the elementary paths and all the simple paths from node 4 to node 10. 4. For each node of the graph of Figure 19.25, determine the set of nodes connected to it.
19.5
Graph Representations
A graph consists of vertices, edges, and (possibly) labels or weights (or both) on the vertices and edges. For small graphs, a diagram can display this information in a convenient way for human use. However, graphs are usually so large that a diagram becomes an incomprehensible maze, and also a nondiagramatic representation is needed for internal storage. Thus, this section will discuss two ways to internally represent a graph. It will also dene the graph class GraphUos that corresponds to the ADT of Section 19.3. We begin with the vertices of a graph. If the vertices only consist of the consecutive integers 1, 2, . . ., n, a vertex can be implicitly represented by a variable with a value between 1 and n. For simple applications, this may be sucient. However, in more complex
866
Graphs
applications, there is more to a vertex than an index. For example, often a vertex has a name as well as its index. When a vertex has more features than its index, an object should be used to represent the vertex. The class for objects that represent vertices, shown in Figure 19.28, is a straightforward implementation of the Vertex ADT. In a graph, the vertices need to be stored in some container, unless they are implicitly represented by an integer value. The container usually used is an array for the following three reasons: 1. The number of vertices is often known or at least bounded. 2. Each vertex usually has a unique index that can be used as an index into the array. 3. Fast vertex access is achieved by accessing each vertex by its index. Hence, the following will be used:
V[ ] vertexArray;
where V is the generic type to represent the vertex type. Other containers could be used to store the vertices. A linked list could be used, but it would not permit O(1) access to a vertex, and the exibility of linked lists in handling insertions and deletions is usually not needed. If vertices have names, then a linear search would be needed to nd a vertex by its name. Other data structures could be used with the name eld as a key; for example, an ordered tree or a hash table. However, it is usually sucient to have fast access of vertices by index. The other important part of a graph is its edges. An individual edge consists of a pair of vertices as per the ADT. In some graph representations, no object is created to represent an edge, so no edge class exists. However, having an edge class provides additional exibility, so we dene one. Our edge class, EdgeUos, is dened in terms of class PairUos, as shown in Figure 19.29. Note that the class is dened to have one generic parameter to represent the vertex type. Normally, the vertex type will be VertexUos, but if additional elds or methods are needed for a vertex, then a descendant of VertexUos should be dened, and the edge should store the descendant type. Using a generic type for the vertex type of an edge makes it easy to use such a descendant vertex type. By having a class for an edge, a descendant of the edge class can be easily dened to give an edge a label or weight. The method toStringGraphIO() with a vertex parameter is used to output the index of the other end of an edge when printing the list of adjacent vertices of a given vertex. A graph can have many edges that need to be stored in one container, say, an array or a linked list. However, the best storage representation is one that facilitates the common or frequent operations. For a graph, the common operations are testing for the presence of a specic edge, the function adjacent(v1, v2) of the ADT, and nding the edges incident to a specic vertex. If all the edges were stored in one array or one linked list, the whole data structure (or at least much of it) would need to be searched to do either of these common operations. With m edges, it would mean that the operations could require time O(m). For a graph with n vertices, there can be as many as n (n 1) edges, one for each possible ordered pair. This would mean that these common operations might require time O(n2 ), which is often unacceptable for large graphs. Depending on which of the common graph operations is used the most, there are two common representations for the storage of the edges of a graph. In the Uos data structure library, there is a class for each representation and a common abstract ancestor class, GraphUos, with the specication of the common aspects. Part of the class GraphUos is
867
dslib.graph;
/** A vertex of a graph where the vertex has a unique index. */ VertexUos Cloneable { /** The index of this vertex. */ index; /** Construct a new vertex with id as its index. Analysis: Time = O(1) @param id index of the new vertex */ VertexUos( id) { index = id; } /** The index of this vertex. Analysis: Time = O(1) */ index() { index; } /** String representation of the vertex. Analysis: Time = O(1) */ String toString() { String.valueOf(index); } /** A shallow clone of this vertex. Analysis: Time = O(1) */ VertexUos clone() {
(VertexUos) .clone(); } (CloneNotSupportedException e) { /* Should not occur: implements Cloneable. */ e.printStackTrace(); ; }

} }
Figure 19.28. The VertexUos class
868
Graphs
dslib.graph; dslib.base.PairUos; dslib.exception.ItemNotFoundUosException;

/** An edge of a graph that has rstItem and secondItem as its ends. Method has tests whether a vertex is an end of the edge, and method other returns the other end. */ EdgeUos<V VertexUos> PairUos<V, V> { /** Construct a new Edge object. Anaysis: Time = O(1) @param v1 initial vertex of the new edge @param v2 terminal vertex of the new edge */ EdgeUos(V v1, V v2) { (v1, v2); } /** Determine if v is an item of the edge. Analysis: Time = O(1) @param v vertex to determin whether it belongs to the edge */ has(V v) { ((v == firstItem) || (v == secondItem)); } /** The other item of the edge. Analysis: Time = O(1) PRECONDITION: has(v) @param v vertex adjacent vertex to v via this edge */ V other(V v) ItemNotFoundUosException { (!has(v)) ItemNotFoundUosException("Cannot return the other item " + "since this vertex does not exist."); (firstItem == v) secondItem; // must have (secondItem == v) firstItem; } /** String representation of the edge for output of vs adjacency list in a graph. Analysis: Time = O(1) */ String toStringGraphIO(V v) { Integer.toString(other(v).index()); } /** A shallow clone of this edge. Analysis: Time = O(1) */ EdgeUos<V> clone() { (EdgeUos<V>) .clone(); } }
Figure 19.29. The EdgeUos class (part 2)
869
shown in Figure 19.30. The next paragraphs will discuss some of the aspects of this class. Note that only part of the class is shown, as it is a large class, and many of its methods are either abstract or straightforward to implement. In particular, no storage is specied for edges, and very few of the edge operations are implemented. The rst thing to note about the class GraphUos is that it is generic with a parameter for the vertex type and a parameter for the edge type. Normally, these two types will be VertexUos and EdgeUos. However, since it is often useful to use descendants of them, the class is dened to use generic types. The elds vertexType and edgeType will be explained shortly, but for now it is sucient to know that they store the String names for the vertex type and edge type. The class GraphUos has 12 constructors, although only six of them are shown in the gure. There are only four basic objectives of the constructors. Four constructors are used to create an empty graph. This graph will have an upper limit on its number of vertices, to be called its capacity. The empty graph has no vertices and no edges. Two constructors are used to read graph from a text le. It is assumed that the input for the graph has the same format as is produced by the toString method. This format is shown in Figure 19.21c. The next two constructors are used to read a graph from the console. The user is prompted to enter the graph information. The edges are entered for one vertex at a time. The last four constructors are used to create a random graph. They will be discussed shortly. The rst two constructors shown in the gure for the class GraphUos are used to construct empty graph with a specic capacity. The rst constructor is shorthand for the second one. The second one shows the creation of the vertex array, and the choice being made whether the graph is to be directed or undirected. It also invokes the method createEdgeDataStructure that creates the data structure to hold the edges. This method is abstract in class GraphUos, as this class does not deal with how to store the edges. The constructor parameters vertexTypeV and edgeTypeE are assigned to the elds vertexType and fieldType that are used for vertex and edge construction. These elds store that fully qualied names of the vertex class and the edge class. The rst constructor simply invokes the second constructor with the two parameters for the default types. The next two constructors of the gure are used to read a graph from a text le. Again the rst of these assumes the vertex and edge types are the default ones, while for the second constructor the user can specify other types for the vertices and edges. All the work of reading the graph is placed in the method fileRead. If the user needs to modify the le format, for example to supply additional information, then this method can be overridden. The following two constructor are for reading the graph from the console. Again, the default types are used for vertices and edges with the rst of these constructors, while the second gives the user the chance to specify other types. Again, the bulk of the work in put in a method that is called read. The last two constructors of the gure are for the generation of a random graph. One parameter species the number of vertices for the graph, while another species the average degree of a vertex. The second constructor shows the code for creating a random graph. The method uses the average degree to compute the probability of an individual edge, called the density. Given the density, each edge is generated with this probability. The method takes care to not generate any self-loops, (i, i) edges, and each undirected edge is only considered when i < j . Following the constructors, the gure shows the methods to create a vertex and to create an edge. The method createNewVertex shows the use of the vertexType eld. First, note that an instance of a generic type cannot be directed created since its actual type is unknown. However, the class GraphUos was dened with a generic type for the vertices,
870
Graphs
dslib.graph; dslib.base.*; dslib.exception.*; java.io.*; java.util.Random; java.util.Scanner;
/** A graph class that is the base of matrix and linked graph representations. The constructors provide the user with a choice of directed or undirected graph structures. In addition, a graph can be constructed empty (no vertices or edges), read from the console, read from a text le, or randomly generated. This class has all the capabilities of a LinearIterator (goBefore, goFirst, goForth, and goAfter) so the vertices of the graph can be iterated in a linear manner. This class denes various abstract methods for traversing edges of the current vertex (eGoFirst, eGoForth, and eSearch), which are implemented in matrix and linked graph representations. Classes that extend this class need to override the createNewEdge or createNewVertex methods, if they use specialized edges or vertices. */ GraphUos<V VertexUos, E EdgeUos<V>> ContainerUos, LinearIteratorUos<V>, CursorSavingUos { /** Array that stores each vertex in the position corresponding to its index. */ V[ ] vertexArray; /** Is the graph directed? Defaults to undirected. */ directed = false; /** Is the graph directed?. Analysis: Time = O(1) */ directed() { directed; } /** The actual type of the vertices (extends VertexUos). */ String vertexType; /** The actual type of the edges (extends EdgeUos). */ String edgeType; /** Construct a new graph with capacity for up to cap vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) PRECONDITION: <ul> ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) </ul> @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ GraphUos( cap, String graphChoice) InvalidArgumentUosException { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", cap, graphChoice); }
Figure 19.30. Part of the GraphUos class (part 1)
871
/** Construct a new graph with capacity for up to cap vertices, with vertex type vertexTypeV and edge type edgeTypeE, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) PRECONDITION: <ul> ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) </ul> @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ @SuppressWarnings("unchecked") GraphUos(String vertexTypeV, String edgeTypeE, cap, String graphChoice) InvalidArgumentUosException { vertexType = vertexTypeV; edgeType = edgeTypeE; vertexArray = (V[ ]) VertexUos[cap]; ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) directed = false; ((graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == d)) directed = true;
InvalidArgumentUosException("Invalid argument--graphChoice "
+ "must be directed or undirected only"); createEdgeDataStructure(); } /** Constructor the data structure to hold the edges of the graph. */ createEdgeDataStructure(); /** Construct a new graph with the graph read from a text le, vertex type dslib.graph.VertexUos, and edge type dslib.graph.EdgeUos. Analysis: Time = O(size of the le) @param leName The name of the text le that contains the graph */ GraphUos(String fileName) RuntimeException { } /** Construct a new graph with the graph read from a text le, vertex type vertexTypeV and edge type edgeTypeE. Analysis: Time = O(size of the le) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param leName The name of the text le that contains the graph */ GraphUos(String vertexTypeV, String edgeTypeE, String fileName) { vertexType = vertexTypeV; edgeType = edgeTypeE; fileRead(fileName); }
("dslib.graph.VertexUos", "dslib.graph.EdgeUos", fileName);
872
Graphs
/** Construct a new graph with the graph read from the console, vertex type dslib.graph.VertexUos, and edge type dslib.graph.EdgeUos. Analysis: Time = O(size of the input) */ GraphUos() { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos"); } /** Construct a new graph with the graph read from the console, vertex type vertexTypeV and edge type edgeTypeE. Analysis: Time = O(size of the input) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph */ GraphUos(String vertexTypeV, String edgeTypeE) { vertexType = vertexTypeV; edgeType = edgeTypeE; read(); } /** Construct a new random graph with numVertices vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos, directed or undirected according to graphChoice, and the edges randomly generated so as to yield an average degree of aveDegree. Analysis: Time = O(numVertices*aveDegree) PRECONDITION: <ul> ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) </ul> @param numVertices The number of vertices for the graph @param aveDegree The desired average degree of the vertices @param graphChoice Indicates whether the graph is directed or indirected */ GraphUos( numVertices, aveDegree, String graphChoice) InvalidArgumentUosException { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", numVertices, aveDegree, graphChoice); }
873
/** Construct a new random graph with numVertices vertices, with vertex type vertexTypeV and edge type edgeTypeE, make it directed or undirected according to graphChoice, and the edges randomly generated so as to yield an average degree of aveDegree. Analysis: Time = O(numVertices*aveDegree) PRECONDITION: <ul> ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) </ul> @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param numVertices The number of vertices for the graph @param aveDegree The desired average degree of the vertices @param graphChoice Indicates whether the graph is directed or indirected */ @SuppressWarnings("unchecked") GraphUos(String vertexTypeV, String edgeTypeE, numVertices, aveDegree, String graphChoice) InvalidArgumentUosException { vertexType = vertexTypeV; edgeType = edgeTypeE; vertexArray = (V[ ]) VertexUos[numVertices]; ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) directed = false; ((graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == d)) directed = true;
InvalidArgumentUosException("Invalid argument--graphChoice "
+ "must be directed or undirected only"); createEdgeDataStructure(); n = 0; ( i = 1; i <= numVertices; i++) vNewIth(i);
density = aveDegree/(n-1); // probability of an edge Random randGen = Random(42945167); ( i = 1; i <= n(); i++) { goIth(i); V v1 = item(); ( j = 1; j <= n(); j++) { ((directed && i != j) || (!directed && j < i)) // for an undirected edge only consider (i, j) when j < i. (randGen.nextDouble() <= density) { goIth(j); V v2 = item(); eNew(v1, v2); } } }
}
874
Graphs
/** Return a new vertex whose type is given in the vertexTypeV parameter of the constructor. Analysis: Time = O(1) @param id The id of the new vertex */ @SuppressWarnings("unchecked") V createNewVertex( id) InvalidArgumentUosException {
{ }
(V) Class.forName(vertexType).getDeclaredConstructors()[0].newInstance(id); InvalidArgumentUosException("Invalid argument--vertex type "

+ + + +
(Exception e)
{ "in graph constructor, \nor arguments for vertex constructor." "\nRecall that the graph constructor must have a String parameter " "with the fully qualified name (specifying the package) for a " "vertex type, if it is not dslib.graph.VertexUos.");
} } /** Return a new edge whose type is given in the edgeTypeE parameter of the constructor. Analysis: Time = O(1) @param v1 The starting vertex of the edge that is returned @param v2 The terminating vertex of the edge that is returned */ @SuppressWarnings("unchecked") E createNewEdge(V v1, V v2) InvalidArgumentUosException {
{ }
(E) Class.forName(edgeType).getDeclaredConstructors()[0].newInstance(v1, v2); InvalidArgumentUosException("Invalid argument--edge type "

+ + + + + "in graph constructor (qualified name of type E), " "\n or arguments for edge contructor (two vertices of type V)." "\nRecall that the graph constructor must have a String parameter " "with the fully qualified name (specifying the package) for an " "edge type, if it is not dslib.graph.EdgeUos.");
(Exception e)
{
} } /** Construct and insert a vertex with index id into the graph, where the index id cannot be used for any other vertex of the graph. Analysis: Time = O(1) PRECONDITION: <ul> !isFull() vertexArrayItem(id) == null </ul> @param id index of the new vertex */ vNewIth( id) ContainerFullUosException, DuplicateItemsUosException { (isFull()) ContainerFullUosException("Cannot add another vertex since " + "graph is full."); (vertexArrayItem(id) != ) DuplicateItemsUosException("Cannot create vertex since " + "index id is already used"); V newItem = createNewVertex(id); vertexArraySetItem(newItem, id); n++; }
875
/** Maximum number of vertices for the graph. Analysis: Time = O(1) */ capacity() { vertexArray.length; } /** Number of vertices. */ n; /** Number of vertices. Analysis: Time = O(1)*/ n() { n; } /** Number of edges. */ m; /** Number of edges. Analysis: Time = O(1) */ m() { m; } /** Does the graph have any vertices?. Analysis: Time = O(1) */ isEmpty() { n == 0; } /** Current item (vertex). */ V item; /** Is there a current item?. Analysis: Time = O(1) */ itemExists() { item != ; } /** The current vertex. Analysis: Time = O(1) PRECONDITION: <ul> itemExists() </ul>*/ V item() NoCurrentItemUosException { (!itemExists()) NoCurrentItemUosException("There is no item to return.");
item;
} /** Index of the current item. */ itemIndex; /** Index of the current item. Analysis: Time = O(1) */ itemIndex() { itemIndex; }
876
Graphs
/** Set the current vertex (item) to be newItem. Analysis: Time = O(1) @param newItem new vertex to be set as the current vertex */ goToItem(V newItem) { item = newItem; itemIndex = newItem.index; } /** Set item to refer to the vertex with index id. Analysis: Time = O(1) @param id index of the vertex to become current */ goIth( id) { itemIndex = id; (1 <= id && id <= capacity()) item = vertexArrayItem(id);
item = ;
} /** Go to the rst vertex. Analysis: Time = O(1) */ goFirst() { itemIndex = 0; goIth(nextNonNullVertexIndex()); } /** Find the next nonempty location in the vertex array. */ nextNonNullVertexIndex() { i = itemIndex + 1; (i <= capacity() && vertexArrayItem(i) == ) i++; i; } /** Go to the next vertex. Analysis: Time = O(capacity), worst case PRECONDITION: <ul> !after() </ul> */ goForth() AfterTheEndUosException { (after()) AfterTheEndUosException("Cannot advance to next item " + "since already after."); goIth(nextNonNullVertexIndex()); }
877
/** Construct and insert an edge with i and j as the indices of its ends. @param i index of the initial vertex of the new edge @param j index of the terminal vertex of the new edge */ eNew( i, j) { CursorPositionUos position = currentPosition(); goIth(i); V v = item(); goIth(j); eNew(v, item()); (!directed) eNew(item(), v); goPosition(position); } /** Construct and insert an edge with ends v1 and v2 into the graph. @param v1 initial vertex of the new edge @param v2 terminal vertex of the new edge */ eNew(V v1, V v2); /** Is v1 adjacent to v2?. @param v1 vertex to check if adjacent to v2 @param v2 vertex to check if adjacent to v1*/ adjacent(V v1, V v2); /** Set eItem to refer to the edge (u,v), or null if not found. @param u The initial vertex of the edge being sought @param v terminal vertex of the edge being sought */ eSearch(V u, V v); /** The current edge. */ E eItem; /** The current edge. Analysis: Time = O(1) PRECONDITION: <ul> eItemExists() </ul> */ E eItem() NoCurrentItemUosException { (!eItemExists()) NoCurrentItemUosException("Cannot return an edge that " + "does not exist.");
eItem;
} /** Is there a current edge?. Analysis: Time = O(1) */ eItemExists() { eItem != ; } /** The current edge. Analysis: Time = O(1) PRECONDITION: <ul> eItemExists() </ul> */ E eItem() NoCurrentItemUosException { (!eItemExists()) NoCurrentItemUosException("Cannot return an edge that " + "does not exist.");
eItem;
}
878
Graphs
/** Go to the rst edge of v to start an edge list iteration. @param v The vertex whose rst edge is to be made the current edge */ eGoFirst(V v); /** Go to the next edge of the edge list being scanned. PRECONDITION: <ul> !eAfter() </ul> */ eGoForth() AfterTheEndUosException;
/** Are we o the end of the edge list being scanned?. */ eAfter(); /** Index of the vertex being edge iterated. */ iterationIndex;
/** Index of the vertex being edge iterated. Analysis: Time = O(1) */ iterationIndex() { iterationIndex; } /** The adjacent vertex for an edge list iteration. Analysis: Time = O(1) PRECONDITION: <ul> eItemExists() </ul>*/ V adjItem() NoCurrentItemUosException { (!eItemExists()) NoCurrentItemUosException("There is no adjacent item if " + "there is no item.");
eItem.other(vertexArrayItem(iterationIndex));
} /** The index of the adjacent vertex in the edge iteration. */ adjIndex(); /** String representation of the graph for output. Analysis: Time = O(max(n, m)) */ String toString() { CursorPositionUos position = currentPosition(); StringBuffer result = StringBuffer(); result.append(n); (directed) result.append(" directed");
result.append("
undirected");
879
goFirst(); (!after()) { result.append("\n" + item() + " : "); eGoFirst(item()); (!eAfter()) { result.append(" " + eItem().toStringGraphIO(item())); eGoForth(); } result.append(" 0"); goForth(); } goPosition(position); String(result); }
Figure 19.30. Part of the GraphUos class (part 10) and creation of a vertex is to be done in the GraphUos class. This seems like an impossible objective to achieve. To resolve this problem, more information must be supplied. This is the role of the constructor parameter vertexTypeV that is assigned to eld vertexType. The parameter holds a String that must contain the fully-qualied name of the actual vertex type. Using this name, the object for the class with this name can be obtained by Class.forName(vertexType). From this object, its rst constructor can be obtained, and an instance of the vertex class created. The method createNewVertex does this, and then casts the result to the generic type for vertices. Method createNewEdge does the same thing for the creation of an edge. As alr4eady stated, if the vertex and edge types are dierent than VertexUos and EdgeUos, then a constructor with the parameters for the types must be used, and the fully-qualied names must be given for the two types. Method vNewIth() has a parameter for the index of a new vertex created by the method. It has a precondition to verify that the index for the new vertex is not already in use. The other methods of the class should be easy to follow. Note that many of the edge methods are abstract as this class does not know the way that edges are stored. Also, the method toString() shows the use of the vertex iterator and the edge iterator to traverse the whole graph. Before using the iterators in toString(), the clients present position in the graph is saved. This is important, as a client would not expect the side eect of the current item being changed as a result of calling toString(). After the graph traversal is complete, the graph is returned to the saved position. Not only must we be sure not to change the clients current item, but the client must also make sure not to start an iteration when the current position is still needed, but not saved. In particular, as we will see in subsequent sections if nested iterations are needed, the position of the rst iteration must be saved before starting the second iteration. A position in a graph is like a cursor position in a list or tree. The next two subsections will investigate the two common representations for the edges of a graph.
19.5.1
Adjacency Matrix Representation
One of the most frequent operations performed on a graph is the function adjacent(v1, v2). The best data structure to use for a graph to facilitate this operation is the adjacency matrix. Such a matrix was dened at the end of Section 19.3 and is shown in
880
Graphs
Figure 19.21d. Provided that each vertex has a unique index, the two indices of the arguments for adjacent(v1, v2) can be used to access in O(1) time the location of the matrix which indicates whether the edge exists. Therefore, if adjacency tests are common, the adjacency matrix should be considered as a graph representation. The description in Section 19.3 species that an adjacency matrix stores 0s and 1s (or falses and trues). Such a representation is sucient if there is no information for an edge other than its ends. However, often an edge has a name, weight, or label. Therefore, there is a need for an object to represent an edge. Thus, we use the edge class of the preceding subsection, and each entry in the matrix will be , an EdgeUos object, or an instance of a descendant of EdgeUos. The resulting class is GraphMatrixRepUos, a descendant of GraphUos. Like GraphUos, the class GraphMatrixRepUos has two generic parameters V extends VertexUos and E extends EdgeUos<V> for the vertex and edge types, respectively. One of the two instance variables of this class is
E[ ][ ] adjMatrix; // matrix to store the edges
and the new part of creating a graph is

adjMatrix = (E[][]) new EdgeUos[cap][cap];
where cap is the parameter for the maximum number of vertices to be allowed in the graph, its capacity(). The dimension of the matrix is capacity capacity , rather than n n, since the graph can grow to its capacity. Note that the amount of storage to store the graph is the size of the matrix namely, O(capacity 2 ). The conceptual, structural, and implementation diagrams for an undirected graph of type GraphMatrixRepUos are shown in Figure 19.31. The conceptual diagram is the same as we have been using. The structural diagram resembles an adjacency matrix, except that the entries are either a reference to an undirected edge or . In addition, the matrix indices are 0, . . . , n 1, whereas the vertex indices are 1, . . . , n as per standard graph notation. The implementation diagram is quite complex even for this very simple graph. Nevertheless, the vertexArray of vertices is easy to nd along the top right, and the adjMatrix of edges is shown at the bottom right. The three vertices and two edges are toward the middle on the right side of the diagram. The adjMatrix is shown as two dimensional, even though its layout in memory would be linear and include the length of each dimension. Note that because the graph is undirected, the (0, 1) entry and the (1, 0) entry of the matrix refer to the same undirected edge. In a directed graph, these two entries would refer to dierent edges. The edge {1, 2} is in adjMatrix[0][1], because the vertex indices are 1 to n while the array indices are 0 to n 1. The remaining elds shown in the graph object are used for the operations of the ADT. Fields n and m store the current number of vertices and edges, respectively. Field item is the current vertex, if it is dened, and itemIndex is its index. During an edge iteration, eItem is the current edge of the iteration, iterationIndex stores the index of the vertex whose edge list is being iterated, and adjIndex is the index of the other vertex (the one not being iterated) of the current edge. Except for adjIndex, these elds are dened in class GraphUos. The implementation diagram assumes that vertex 3 is the current vertex, vertex 2 is being iterated (iterationIndex = 2), and the current edge is the {1, 2} edge. The actual EdgeUos and GraphMatrixRepUos classes are generic, but this is not shown in the diagram to keep it simpler. Also, the vertexType and edgeType elds are not shown. In Figure 19.31, the graph has its vertices labeled by consecutive integers. Thus, each vertex has an obvious index. However, in many applications, the vertices are not labeled
881
{1, 2} {1, 3} 2 0 1 2 0 1 2 (b) Structural diagram
1 3 (a) Conceptual diagram
GraphM atrixRepUos n 3 m 2 vertexArray adjIndex 1 g directed false item itemIndex 3 iterationIndex 2 EdgeUos eItem adjM atrix firstItem secondItem EdgeUos firstItem secondItem VertexUos index 1 VertexUos index 2 VertexUos index 3 length 0 1 2 3
0 1 2 0 1 2 (c) Implementation diagram
Figure 19.31. Views of the matrix representation of a simple undirected graph by integers; either they are not labeled or labeled by alphabetical names. In such a case, a unique index still must be associated with each vertex. Each dierent set of vertexindex associations usually results in a dierent adjacency matrix for the same graph. However, as previously indicated, we shall neglect the arbitrariness introduced in an adjacency matrix owing to the ordering of the vertices, and consider any adjacency matrix of the graph to be the adjacency matrix of the graph. The Java implementation of class GraphMatrixRepUos is given in Figure 19.32. Only the constructors to create an empty graph are shown. The constructors to read a graph
882
Graphs
dslib.graph; dslib.exception.*; dslib.base.CursorPositionUos;

/** A graph with n vertices and m edges using a matrix to store the edges between adjacent vertex pairs. The graph is either directed or undirected depending upon the creation procedure used. The current vertex (for searches and iteration) is item, and the current edge (for searches and iteration) is eItem. */ GraphMatrixRepUos<V VertexUos, E EdgeUos<V>> GraphUos<V, E> { /** Internal representation of adjacency list. */ E[ ][ ] adjMatrix; /** The index of the adjacent vertex in the edge iteration. */ adjIndex = 0; /** Construct a new undirected graph with capacity for up to cap vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos. Analysis: Time = O(cap) @param cap Maximum number of vertices allowed in the graph */ GraphMatrixRepUos( cap) { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", cap, "undirected"); } /** Construct a new undirected graph with capacity for up to cap vertices, with vertex type vertexTypeV and edge type edgeTypeE. Analysis: Time = O(cap) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param cap Maximum number of vertices allowed in the graph */ GraphMatrixRepUos(String vertexTypeV, String edgeTypeE, cap) { (vertexTypeV, edgeTypeE, cap, "undirected"); } /** Construct a new graph with capacity for up to cap vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) @precond ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ GraphMatrixRepUos( cap, String graphChoice) InvalidArgumentUosException { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", cap, graphChoice); }
Figure 19.32. The GraphMatrixRepUos class (part 1)
883
/** Construct a new graph with capacity for up to cap vertices, with vertex type vertexTypeV and edge type edgeTypeE, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) @precond ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ @SuppressWarnings("unchecked") GraphMatrixRepUos(String vertexTypeV, String edgeTypeE, cap, String graphChoice) InvalidArgumentUosException { (vertexTypeV, edgeTypeE, cap, graphChoice); adjMatrix = (E[ ][ ]) EdgeUos[cap][cap]; } /** Is v1 adjacent to v2?. Analysis: Time = O(degree(v1)), worst case @param v1 vertex to check if adjacent to v2 @param v2 vertex to check if adjacent to v1 */ adjacent(V v1, V v2) { adjMatrixItem(v1.index, v2.index) != ; } /** Construct and insert an edge with ends v1 and v2 into the graph. Analysis: Time = O(1) @param v1 initial vertex of the new edge @param v2 terminal vertex of the new edge */ eNew(V v1, V v2) { eItem = createNewEdge(v1, v2); adjMatrixSetItem(eItem, v1.index, v2.index); (!directed) adjMatrixSetItem(eItem, v2.index, v1.index); m++; } /** Move to edge (v1,v2), if it exists. Analysis: Time = O(1) @param v1 initial vertex of the edge being sought @param v2 terminal vertex of the edge being soughtr */ eSearch(V v1, V v2) { eItem = adjMatrixItem(v1.index, v2.index); iterationIndex = v1.index; adjIndex = v2.index; }
884
Graphs
/** Remove all vertices from the graph. Analysis: Time = O(capacity) */ clear() { .clear(); ( i = 1; i <= adjMatrix.length; i++) ( j = 1; j <= adjMatrix.length; j++) adjMatrixSetItem(, i, j); } /** Go to the rst edge of v1 starting an edge list iteration. Analysis: Time = O(n), worst case @param v1 vertex whose edges are to be iterated */ eGoFirst(V v1) { iterationIndex = v1.index; adjIndex = 0; eGoForth(); } /** The index of the adjacent vertex in the edge iteration. Analysis: Time = O(1) */ adjIndex() { adjIndex; } /** Go to the next edge of the edge list being scanned. Analysis: Time = O(n), worst case @precond !eAfter() */ eGoForth() AfterTheEndUosException { (eAfter()) AfterTheEndUosException("Cannot go to next edge since already after."); adjIndex = nextNonNullAdjIndex(); (eAfter()) eItem = ;
eItem = adjMatrixItem(iterationIndex, adjIndex); } /** Are we o the end of the edge list being scanned?. Analysis: Time = O(1) */ eAfter() { adjIndex > capacity(); }
885
/** Next nonnull vertex. Analysis: Time = O(capcity) */ nextNonNullAdjIndex() { result = adjIndex + 1; ((result <= capacity()) && (adjMatrixItem(iterationIndex, result) == )) result++; result; } /** Delete the current edge. Analysis: Time = O(1) @precond: eItemExists() */ deleteEItem() NoCurrentItemUosException { (!eItemExists()) NoCurrentItemUosException("Cannot delete an item that does not exist."); adjMatrixSetItem(, eItem.firstItem().index, eItem.secondItem().index); (!directed) adjMatrixSetItem(, eItem.secondItem().index, eItem.firstItem().index); m--; eGoForth(); } /** The current position in the graph. Analysis: Time = O(1) */ CursorPositionUos currentPosition() { GraphMatrixRepPositionUos<V, E>(item, itemIndex, iterationIndex, eItem, adjIndex); } /** Go to the position pos of the graph. Analysis: Time = O(1) @param pos The graph position that is to become the current position */ goPosition(CursorPositionUos pos) { GraphMatrixRepPositionUos<V, E> matrixPos = (GraphMatrixRepPositionUos<V, E>) pos; item = matrixPos.item; itemIndex = matrixPos.itemIndex; iterationIndex = matrixPos.iterationIndex; eItem = matrixPos.eItem; adjIndex = matrixPos.adjIndex; }
886
/** A shallow clone of this object. Analysis: Time = O(1) */ GraphMatrixRepUos<V, E> clone() { (GraphMatrixRepUos<V, E>) .clone(); } /** The edge between the vertices with indices i and j. Analysis: Time = O(1) */ E adjMatrixItem( i, j) { /* Vertex indices 1 through n are mapped to 0 through n - 1. */ adjMatrix[i-1][j-1]; } /** Set the edge between the vertices with indices i and j. Analysis: Time = O(1) */ adjMatrixSetItem(E e, i, j) { /* Vertex indices 1 through n are mapped to 0 through n - 1. */ adjMatrix[i-1][j-1] = e; } }
Graphs
Figure 19.32. The GraphMatrixRepUos class (part 5) from a text le, to read a graph from the console, and to create a random graph exist in the class but are not shown. The methods of this class should be easy to follow. Note that one matrix access is sucient to determine the result for the adjacent() function. Therefore, the time for this operation is O(1) much better than doing a search of a large container containing all the edges. Now, consider the other common operation in a graph, nding all edges incident to a specied vertex. In performing such an operation, it is necessary to scan across the row of the matrix corresponding to the vertex. This is a relatively simple task, especially when using the edge iteration facility, but it takes time O(capacity ). It is much better than scanning through all the edges of the whole graph, which would be necessary if the edges were just stored in one large container. However, it still takes O(capacity ) time to nd the incident vertices, even if there are only a couple of them and capacity is large.
19.5.2
Adjacency Lists Representation
The other common representation of a graph, the adjacency lists representation, seeks to optimize the operation of nding the edges incident to a specied vertex. As a result, it stores with each vertex a linked list of its incident edges. Thus, the adjacent vertices for a vertex v can be found in time O(degree(v )) (i.e., the number of edges incident to v). For this representation, the following array is used:
LinkedListUos<E>[ ] adjListsArray;
where E is the generic type for the edge type. The lists could be dened to be a SimpleList, i.e., have type LinkedSimpleListUos<E>, but such a class lacks some methods useful for the present purpose. In particular, when reading the edges of a graph, generally a new edge should be added to the end of an adjacency list. Thus, the capability to add to the end of a list is needed. Also, edge iteration will iterate down an adjacency list. Therefore, a list
887
type with iteration capability is convenient. As a result, an adjacency list is stored in an instance of LinkedListUos<E>. The conceptual, structural, and implementation diagrams for a directed graph of type GraphLinkedRepUos are shown in Figure 19.33. The conceptual diagram should be familiar by now. Note that the vertices are labeled by letters in this diagram. As we have indicated, in order to be able to index the location of a vertex in vertexArray, each vertex must be associated with a unique index. These integer values are shown in the other two diagrams. As can easily be seen, with a dierent association of indices to vertices, a somewhat dierent representation would be obtained for the graph. These dierences, which are obtained by dierent associations, will be ignored. The structural diagram of Figure 19.33 shows the three linked lists of edges. Since the graph is directed, the edge (1, 2) is distinct from the edge (2, 1). The implementation diagram is even more complex than the one for the matrix representation. The upper right part of the diagram has the array of vertices, and the individual vertices are the same as for the matrix representation. On the bottom left of the implementation diagram, there is the array of adjacency lists. Each adjacency list is an instance of LinkedListUos<E>, with only the firstNode eld shown to reduce the size of the gure. Also, the generic parameters of various classes have been omitted. The nodes of the linked lists, shown in the bottom right of the diagram, refer to edges of the graph. The edges appear toward the middle on the right-hand side of the diagram. As would be expected, the implementation shows that there are two objects to represent the two edges (1, 2) and (2, 1), whereas in the undirected case there is only one object to represent the edge {1, 2}. The implementation diagram assumes that vertex 3, b, is the current vertex, vertex 2 is being iterated (iterationIndex == 2), and the current edge is the (2, 1) edge. The class GraphLinkedRepUos is given in Figure 19.34. Again, only the constructors to create an empty graph are shown. Note that in the function adjacent(), the current position is saved at the start of the function and restored at the end of the function to ensure that the operation does not have the side eect of changing the current edge of the client. Because the vertex array must be stored and a node of a linked list must be stored for each edge, the amount of storage required is given by O(capacity + m), where capacity is the maximum number of vertices allowed and m denotes the number of edges. Thus, for large graphs with few edges, the adjacency lists representation will require signicantly less storage than the adjacency matrix. We have discussed nding the edges incident to a specied vertex, but what about the other common operation adjacent(v1, v2)? With the adjacency lists representation, the function must scan the adjacency list of v1 to check for v2. The time for this is O(degree(v 1)). Note that O(.) is appropriate as the scan can stop as soon as v2 is found. Hence, the operation is not as ecient for the adjacency lists representation as for the adjacency matrix representation. Table 19.2 shows the comparison of the two representations; it indicates that the best representation depends on which of the common operations is performed most often and perhaps on the density (m/capacity 2) of the graph.
19.5.3
Searchable Graph
When writing algorithms to traverse through a graph looking for some property of the graph, it is necessary to be very careful not to get into an innite loop scanning around and around some cycle formed by the edges. Consequently, it is necessary to mark a vertex that the algorithm has already visited so that it is not revisited repeatedly. To do the marking, it is necessary to add another instance variable to a vertex object, and have a graph object use
888
Graphs
a c b (a) Conceptual diagram GraphLinkedRepUos n 3 m 3 vertexArray adjListsArray h directed true item itemIndex 3 iterationIndex 2 EdgeUos eItem firstItem secondItem VertexUos index 1 0 1 2
(1, 2)
(1, 3)
(2, 1)
(b) Structural diagram
length 0 1 2 3
VertexUos index 2
VertexUos index 3
firstItem secondItem
firstItem secondItem
item length 3 0 1 2 item firstNode nextNode LinkedListUos LinkedNodeUos (c) Implementation diagram firstNode nextNode firstNode
item nextNode
Figure 19.33. Views of the adjacency lists representation of a simple directed graph
889
dslib.graph; dslib.list.*; dslib.exception.*; dslib.base.CursorPositionUos;
/** A graph with n vertices and m edges using a linked list to store the list of adjacent vertices for each vertex. The graph is either directed or undirected depending upon the creation procedure used. The current vertex (for searches and iteration) is item, and the current edge (for searches and iteration) is eItem. */ GraphLinkedRepUos<V VertexUos, E EdgeUos<V>> GraphUos<V, E> { /** Internal representation of adjacency list. */ LinkedListUos<E>[ ] adjListsArray; /** Construct a new undirected graph with capacity for up to cap vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos. Analysis: Time = O(cap) @param cap Maximum number of vertices allowed in the graph */ GraphLinkedRepUos( cap) { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", cap, "undirected"); } /** Construct a new undirected graph with capacity for up to cap vertices, with vertex type vertexTypeV and edge type edgeTypeE. Analysis: Time = O(cap) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param cap Maximum number of vertices allowed in the graph */ GraphLinkedRepUos(String vertexTypeV, String edgeTypeE, cap) { (vertexTypeV, edgeTypeE, cap, "undirected"); } /** Construct a new graph with capacity for up to cap vertices, with vertex type dslib.graph.VertexUos and edge type dslib.graph.EdgeUos, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) @precond ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ GraphLinkedRepUos( cap, String graphChoice) InvalidArgumentUosException { ("dslib.graph.VertexUos", "dslib.graph.EdgeUos", cap, graphChoice); }
Figure 19.34. The GraphLinkedRepUos class (part 1)
890
Graphs
/** Construct a new graph with capacity for up to cap vertices, with vertex type vertexTypeV and edge type edgeTypeE, and make it directed or undirected according to graphChoice. Analysis: Time = O(cap) @precond ((graphChoice.charAt(0) == u) || (graphChoice.charAt(0) == U)) || (( graphChoice.charAt(0) == d) || (graphChoice.charAt(0) == D)) @param vertexTypeV The name of the type for the vertices of the graph @param edgeTypeE The name of the type for the edges of the graph @param cap Maximum number of vertices allowed in the graph @param graphChoice Indicates whether the graph is directed or indirected */ @SuppressWarnings("unchecked") GraphLinkedRepUos(String vertexTypeV, String edgeTypeE, cap, String graphChoice) InvalidArgumentUosException { (vertexTypeV, edgeTypeE, cap, graphChoice); adjListsArray = LinkedListUos[cap]; } /** Construct and insert a vertex with index id into the graph. Analysis: Time = O(1) @precond !isFull() !(vertexList[id] != null) @param name name of the new vertex @param id index of the new vertex */ vNewIth(String name, id) ContainerFullUosException, DuplicateItemsUosException { (isFull()) ContainerFullUosException("Cannot insert another vertex " + "since the graph is full."); (vertexArrayItem(id) != ) DuplicateItemsUosException("Cannot insert a vertex that " + "already exists.");
.vNewIth(name, id); adjListsArraySetItem( LinkedListUos<E>(), id);

} /** Construct and insert an edge with ends v1 and v2 into the graph. Analysis: Time = O(1) @param v1 initial vertex of the new edge @param v2 terminal vertex of the new edge */ eNew(V v1, V v2) { E newEItem = createNewEdge(v1, v2); adjListsArrayItem(v1.index).insertLast(newEItem); (!directed) adjListsArrayItem(v2.index).insertLast(newEItem); m++; }
891
/** Move to the edge (v1, v2), if it exists. Analysis: Time = O(n), where n = size of the edge list associated with v1 @param v1 initial vertex of the edge being sought @param v2 terminal vertex of the edge being sought */ eSearch(V v1, V v2) { eGoFirst(v1); (!eAfter() && (adjItem() != v2)) eGoForth(); } /** Delete the current edge and move to the next edge for the vertex being iterated. Analysis: Time = O(1) @precond eItemExists()*/ deleteEItem() NoCurrentItemUosException { (!eItemExists()) NoCurrentItemUosException("Cannot delete an item that does not " + "exist."); adjListsArrayItem(eItem.firstItem().index).delete(eItem); (!directed) adjListsArrayItem(eItem.secondItem().index).delete(eItem); (adjListsArrayItem(eItem.firstItem().index).after()) eItem = ;
eItem = adjListsArrayItem(eItem.firstItem().index).item(); m--; } /** Go to the rst edge of v1 starting edge list iteration. Analysis: Time = O(1) @param v1 The vertex whose adjacency list is to be iterated */ eGoFirst(V v1) { iterationIndex = v1.index(); LinkedListUos<E> adjList = adjListsArrayItem(iterationIndex); adjList.goFirst();
(adjList.after()) eItem = ; eItem = adjList.item();
/** Go to the next edge of the edge list being scanned. Analysis: Time = O(1) @precond !eAfter()*/ eGoForth() AfterTheEndUosException { (eAfter()) AfterTheEndUosException("Cannot proceed to next edge since " + "already after."); LinkedListUos<E> adjList = adjListsArrayItem(iterationIndex); adjList.goForth(); (adjList.after()) eItem = ;
eItem = adjList.item(); }
892
Graphs
/** Are we after the last edge for current adjacency list?. Analysis: Time = O(1) */ eAfter() { adjListsArrayItem(iterationIndex).after(); } /** The index of the adjacency vertex for an edge list iteration. Analysis: Time = O(1) @precond eItemExists() */ adjIndex() { (eItemExists()) adjItem().index();
NoIteratorItemUosException("Must have a current edge to obtain "

+ " the index of the adjacent vertex.");
} /** Is v1 adjacent to v2?. Analysis: Time = O(degree(v1)), worst case @param v1 vertex to check if adjacent to v2 @param v2 vertex to check if adjacent to v1 */ adjacent(V v1, V v2) { CursorPositionUos position = currentPosition(); eSearch(v1, v2); result = (eItem != ); goPosition(position); result; } /** Remove all vertices from the graph. Analysis: Time = O(capacity) */ clear() { .clear(); ( i = 1; i <= adjListsArray.length; i++) // wipe out array adjListsArraySetItem(, i); } /** The current position in the graph. Analysis: Time = O(1) */ CursorPositionUos currentPosition() { (iterationIndex != 0) GraphLinkedRepPositionUos<V, E>(item, itemIndex, iterationIndex, eItem, adjListsArrayItem(iterationIndex).currentPosition());
GraphLinkedRepPositionUos<V, E>(item, itemIndex, iterationIndex, eItem, );
893
/** Go to the position pos of the graph. Analysis: Time = O(1) @param pos The graph position that is to become the current position */ goPosition(CursorPositionUos pos) { GraphLinkedRepPositionUos<V, E> linkedPos = (GraphLinkedRepPositionUos<V, E>) pos; item = linkedPos.item; itemIndex = linkedPos.itemIndex; iterationIndex = linkedPos.iterationIndex; eItem = linkedPos.eItem; (iterationIndex != 0) adjListsArrayItem(iterationIndex).goPosition(linkedPos.listPosition); } /** A shallow clone of this object. Analysis: Time = O(1) */ GraphLinkedRepUos<V, E> clone() { (GraphLinkedRepUos<V, E>) .clone(); } /** The ith adjacency list of the graph. Analysis: Time = O(1) */ LinkedListUos<E> adjListsArrayItem( i) { /* lists 1 through n are stored in 0 through n-1. */ adjListsArray[i-1]; } /** Set the ith adjacency list of the graph. Analysis: Time = O(1) */ adjListsArraySetItem(LinkedListUos<E> x, { /* vertices 1 through n are stored in 0 through n-1. */ adjListsArray[i-1] = x; } /** A unique class identier for serializing and deserializing. */ serialVersionUID = 1l; }
i)
Figure 19.34. The GraphLinkedRepUos class (part 5) Table 19.2. Comparison of Adjacency Lists and Adjacency Matrix Representations Representation Comparison Criteria Adjacency Matrix Adjacency Lists adjacent(v1, v2) O(1) O(degree(v1)) all incident edges for v O(capacity) O(degree(v)) storage O(capacity2 ) O(capacity+m) these marked vertex objects. The key to implementing this change is dene the new vertex type, use it for the generic parameter of class GraphUos, and specify its fully-qualied name for the vertexType parameter of the constructor. The new vertex type with the added instance variable is given in Figure 19.35. It is a straightforward denition of a descendant with the additional instance variable. As this
894
Graphs
dslib.graph;
/** A searchable vertex that can be marked reached or not. */ SearchVertexUos VertexUos { /** Has this vertex been reached?. */ reached = false; /** Construct a new vertex, store id as the vertexs index. Analysis: Time = O(1) @param id index of the new vertex */ SearchVertexUos ( id) { (id); } /** Has this vertex been reached?. Analysis: Time = O(1) */ reached() { reached; } /** Set the reached value of this vertex. Analysis: Time = O(1) */ setReached( newReached) { reached = newReached; } /** A shallow clone of this vertex. Analysis: Time = O(1) */ SearchVertexUos clone() { (SearchVertexUos) .clone(); } }
Figure 19.35. Vertex with reached instance variable class is in package dslib.graph, an empty undirected graph of size 10 that uses this new vertex type can be declared and created as follows:
GraphLinkedRepUos<SearchVertexUos, EdgeUos<SearchVertexUos>> gp = GraphLinkedRepUos<SearchVertexUos, EdgeUos<SearchVertexUos>> ("dslib.graph.SearchVertexUos", "dslib.graph.EdgeUos", 10);
Because the vertex type is a descendant of VertexUos, a constructor must be used with parameters for the String names of the types, and the names must be fully qualied (package and all). Of course, if new methods are to be added to the graph class, a new class must be dened. It would be a straightforward descendant of GraphLinkedRepUos or GraphMatrixRepUos. Problems 19.5.3
1. Dene a descendant of class EdgeUos that has a eld called weight. Also, dene a graph class that is a descendant of class GraphMatrixRepUos or class GraphLinkedRepUos and has weighted edges. Note that the read and fileRead methods should be modied to handle the weights.
Sec. 19.6. Computing Paths from a Matrix Representation of Graphs
895
v1
v2
v3
v5
v4
Figure 19.36. A small digraph
2. Dene a graph class, a descendant of class GraphMatrixRepUos or class GraphLinkedRepUos, which maintains the degree of each vertex. This includes dening an appropriate class for the vertices and modifying the relevant graph features to ensure that the values for the degrees are always correct.
19.6
Computing Paths from a Matrix Representation of Graphs

As mentioned in Section 19.4, the paths in a graph are often very important. Therefore, this section will focus on the computation of paths for a graph. Often, the path information for a graph is stored in a matrix. As a result, the common representation of a graph when computing path information is the adjacency matrix representation. It is also appropriate because well-known operations of matrix algebra can be used to calculate paths, cycles, and other characteristics of a graph. As a result, in this section we will assume that we are given a 01 adjacency matrix A for the graph. The algorithm of Figure 19.22 generates such a matrix for an arbitrary graph g by using ADT operations. Thus, regardless of whether we start with the graph represented by its adjacency matrix, it is easy to obtain the matrix. This section will assume that we are given a directed graph (digraph). The analysis also applies to undirected graphs, as each undirected edge can be replaced by two directed edges, one going in each direction. If {v1 , v2 } is an undirected edge, the transformation will allow the simple cycle v1 , v2 , v1 , which is not a simple cycle in the corresponding undirected graph. However, in general, the algorithms of this section do not distinguish between simple paths and nonsimple paths, so it is not a problem.
19.6.1
Computing Reachability, Using Matrix Multiplications
Consider the example digraph given in Figure 19.36, where we assume that the nodes are ordered v1 , v2 , v3 , v4 , and v5 . Its adjacency matrix is 0 0 0 1 0 0 0 1 0 0 A= 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 By denition, an entry of 1 in the ith row and j th column of A shows the existence of an edge (vi , vj ) that is, a path of length 1 from vi to vj . This element of the matrix is
896 denoted by aij . Now, dene a matrix A2 by the elements a2 ij given by

n
Graphs
a2 ij =
k=1
aik akj
Note that this is the normal matrix multiplication of A times itself. Let us now see the interpretation of such a matrix from a graph perspective. For any xed k , aik akj = 1 if and only if both aik and akj equal 1, that is, when (vi , vk ) and (vk , vj ) are both edges of the graph. Now existence of (vi , vk ) and (vk , vj ) implies that there is a path from vi to vj of length 2. For each such k , we get a contribution of 1 in the sum. Therefore, a2 ij is equal to the number of dierent paths of length exactly 2 from vi to vj . For any r > 1, the rth power of A, Ar , can be dened by the elements ar ij , where
n
ar ij =
k=1
1 ar ik akj
and A1 = A. The element in the ith row and the j th column of Ar , for r > 0, is equal to the number of paths of length r from the ith node to the j th node. This fact can easily be veried by mathematical induction (see Section C.4). The following matrices are obtained from A: 0 0 0 1 0 0 1 0 2 0 1 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 2 0 1 0 1 0 1 0 2 1 0 0 0 1 0 0 0 0 0 1
A =
2
A =
3
For the graph given in Figure 19.36, we see that there is one path of length 2 from v4 to v1 ; hence, the entry 1 in the fourth row and rst column of A2 . Similarly, there are two paths of length 2 from v4 to v2 , 4, 5, 2 and 4, 3, 2 ; thus, the corresponding entry in A2 . There are two paths of length 4 from v4 to v2 , 4, 5, 2, 3, 2 and 4, 3, 2, 3, 2 , so the corresponding entry in A4 is 2. Note that these paths are not necessarily elementary (distinct nodes) or even simple (distinct edges). Given a simple digraph G = (V, E ), let vi and vj be any two nodes of G. From the adjacency matrix of A, we can immediately determine whether there exists an edge from vi to vj in G. Additionally, from the matrix Ar , where r is some positive integer, we can establish the number of paths of length r from vi to vj . Using matrix addition, we can dene Br by Br = A + A2 + . . . + Ar From the matrix Br we can determine the number of paths of length less than or equal to r from vi to vj .
A =
4
0 0 0 0 1
0 1 0 2 2
2 0 1 1 1
1 0 0 0 0
0 0 0 1 0
A =
5
0 0 0 1 0
2 0 1 2 1
1 1 0 2 2
0 0 0 0 1
1 0 0 0 0
897
Suppose we wish to determine whether vj is reachable from vi (i.e., determine whether there exists a path of any length from vi to vj ). To decide this, we could consider all possible Ar for r = 1, 2, 3, . . ., and add them all together. This method is neither practical nor necessary, as we shall show. Recall that in a simple digraph with n nodes, the length of an elementary path or cycle cannot exceed n. The n value arises from the cycle that includes all the nodes. Also, any path can be converted to an elementary path by eliminating all cycles within the path, and any cycle can similarly be converted into an elementary cycle. Therefore, to determine whether there exists a path from vi to vj , we only need to examine the paths of length less than or equal to n. Such paths and cycles are counted by Bn , where Bn = A + A2 + A3 + . . . + An The element in the ith row and j th column of Bn is equal to the number of paths of length n or less, which exists from vi to vj . If this element is nonzero, then it is clear that vj is reachable from vi . The matrix corresponding to the reachability relationship is called the path matrix (or reachability matrix ) P . It is dened as follows: pij = 1 0 if there exists a path from vi to vj otherwise
Note that the diagonal entry pii indicates whether there is a cycle that includes vi . As the preceding approach shows, the path matrix p can be calculated as follows: pij = 1 if the element in the ith row and j th column of Bn is nonzero 0 otherwise
The time complexity of this approach is (n4 ), where n is the number of nodes in the graph. This result follows from the fact that computation of each Ar from Ar1 is (n3 ), and the calculation needs to be done for r = 2, 3, . . ., n. Such an approach is not very ecient, except, possibly, for very small values of n. The calculations can be simplied somewhat, since we are only interested in the existence of paths rather than in counting them, but the approach still requires time (n4 ).
19.6.2
Ecient Reachability Algorithm
We now explore a more ecient method, known as Warshalls algorithm, for computing the path or reachability matrix. The goal of such an approach is to generate a sequence of matrices P (0) , P (1) , P (2) , . . . , P (n) for a graph of n nodes, such that P (n) = P (the path matrix). Consider a path from vi to vj . Such a path is either only the edge (vi , vj ), with no intermediate nodes, or else it contains a number of intermediate nodes. For example, the path v4 , v3 has no intermediate nodes, since it is one edge, whereas the path v1 , v4 , v3 , v2 has two intermediate nodes v4 and v3 . The key to the denition of the P (k) s, 0 k n, is the nodes allowed as intermediate nodes on a path. The general denition is 1 if there exists an edge from vi to vj , or a path from vi to vj , using (k ) pij = only intermediate nodes from {v1 , v2 , . . . , vk } 0 otherwise
898 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0
Graphs
Intermediate set : { }
Intermediate set : {v1 }
Intermediate set: {v1 , v2 }
(a) P (1) 0 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1
(b) P (1) 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1
(c) P (2) 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
Intermediate set : {v1 , v2 , v3 }
Intermediate set : {v1 , v2 , v3 , v4 }
Intermediate set : {v1 , v2 , v3 , v4 , v5 }
(d) P (3)
(e) P (4)
(f) P (5)
Figure 19.37. Trace of successive iterations, using Warshalls approach P (0) = A (the adjacency matrix), since P (0) does not allow any intermediate nodes, only the direct edges of A. Next, P (1) considers paths where the set of possible intermediate nodes is {v1 } (i.e., the paths considered have one edge with no intermediate nodes, or else have v1 as the only intermediate node). In our sample graph of Figure 19.36, there is one path, with only node v1 as an intermediate node, the path v5 , v1 , v4 . Therefore, P (1) has the 1s of A (for the one-edged paths) plus a 1 in position (5, 1). This is shown in Figure 19.37b. (1) Thus, a general element pij is given by
(1)
pij =
1 0
if there exists an edge from vi to vj , or there is an edge from vi to v1 and an edge from v1 to vj ; otherwise
(1)
The general element pij can be calculated by considering the two cases. If there is an edge from vi to vj that is, pij = 1 (since P (0) = A), pij should be set to 1. If this does not hold, it is necessary to check whether there is an edge from vi to v1 and one from v1 (0) (0) (1) to vj . If both of these edges exist, pij should be set to 1. Consider the product pi1 p1j . This product is 1 if both edges exist and 0 otherwise exactly what we want. Therefore, a general element of P (1) can be calculated by
pij = 1, i.e., aij = 1 (1) pij := 1 pij := pi1 p1j
(1) (0) (0) (0)
(0)
(1)
Matrix P (2) considers paths from any node to any other node where each path can have as intermediate nodes only v1 , only v2 , both v1 and v2 , or neither of them. Hence, we dene (2) its general element pij as follows:
899
vk
possibly use v1, v2, , vk-1 (k-1) i.e., Pi k
possibly use v1, v2, , vk-1 (k-1) i.e., Pk j
vi
possibly use v1, v2, , vk-1 (k-1) i.e., Pi j
vj
Figure 19.38. Using vk as an intermediate node in computing the matrix P (k)
(2) pij
1 0
if there exists an edge from vi to vj , or a path from vi to vj , using only pivots (intermediate nodes) from {v1 , v2 }; otherwise
In the current example, using v2 as an intermediate node reveals paths v5 to v3 and v3 to v3 . These two paths are added to the ones in P (1) to obtain the P (2) given in Figure 19.37c. In the graph, there are no paths with both v1 and v2 as intermediate nodes and no others. However, the calculation of P (2) must consider such a possibility. It turns out that it is not dicult, since we can use P (1) . A path from vi to vj that has both v1 and v2 as intermediate nodes is either vi , v1 , v2 , vj or vi , v2 , v1 , vj . But the possibility of using v1 on a path to or from v2 has already been considered in P (1) . Therefore, a valid path with intermediate nodes from {v1 , v2 } either already exists in P (1) , or else, it must use v2 as an intermediate node, where v1 might be used on the way to v2 , or on the way from it. As a result, a general element can be computed by
pij = 1 (2) pij := 1 pij := pi2 p2j
(2) (1) (1) (1)
In general, P (k) considers paths with intermediate nodes from {v1 , v2 , . . . , vk }. Such a path either does not include node vk or includes it, as shown by the wavy lines in Figure 19.38. If such a path does not include vk , the path already exists in P (k1) . If node vk is on the path, any of v1 , . . . vk1 may be on the part of the path to vk or the part of the (k ) path from it. Thus, we can compute pij from P (k1) as follows:
pij =1 (k ) pij := 1 pij := pik
(k ) (k1) (k1)
pkj
(k1)
For a graph with n nodes, since all nodes are allowed as intermediate nodes in P (n) , the path matrix P is given by P (n) . For our current example, to complete the computation
900
Graphs
/** The reachability (path) matrix computed by means of Warshalls algorithm. Analysis: Time = O(n3), where n is the number of nodes in the graph */ [ ][ ] warshallsAlg ( [ ][ ] a) { [ ][ ] w = ( [ ][ ])a.clone(); ( k = 1; k <= a.length; k++) ( i = 1; i <= a.length; i++) ( j = 1; j <= a.length; j++) (matrixItem(w, i, j) == 0) matrixSetItem(w, matrixItem(w, i, k) * matrixItem(w, k, j), i, j); w; }
Figure 19.39. Warshalls algorithm for computing the path matrix of a graph of the path matrix, it is necessary to compute P (3) to P (5) . These results are given in Figure 19.37d to Figure 19.37f, where the path matrix P = P (5) . Note that a path is discovered when the matrix is computed for the highest ranked intermediate node. In particular for the path v1 , v4 , v3 , v2 , it is discovered when doing (4) (3) (3) p12 = p14 p42 . So far, our discussion of the algorithm involves the computation of n + 1 matrices, P (0) , P (1) , . . . , P (n) . However, each matrix involves all the 1s from the previous matrix plus some more. As a result, only one matrix is needed. The resulting algorithm, which is due to Warshall, is given in Figure 19.39. Note that the integers 1 through n are used instead of the node set {v1 , v2 , . . . , vn }. Moreover, a new value is placed in location (i, j) only when it previously had a value of 0. The timing of Warshalls algorithm is easy to analyze. The number of times that the (w[i][j] == 0) statement is performed is (n3 ). This algorithm is more ecient than the (n4 ) brute-force approach discussed earlier.
19.6.3
All Pairs Shortest Paths Algorithm
Rather than only being interested in the existence of a path, often the edges have weights and the shortest path between two nodes is desired. An example of a weighted graph is the highway system shown in Figure 19.2. In that context, the length of a path is the sum of the weights of the edges on the path. Instead of starting with the adjacency matrix, it is convenient to start with a matrix that stores the length of the edge between each pair of nodes. If a pair is not joined by an edge, the value stored is INF (to stand for ) or some very large integer. This large value is used to show that there is no edge between the nodes in question and that no shortest path should go directly from the rst node to the second one. The distance from any node to itself, without traversing any edges, is assumed to be 0, and its corresponding entry in the matrix is 0. This matrix will be called W . Provided all the weights are nonnegative, a variation of Warshalls algorithm can be used to obtain a matrix that contains the shortest paths between all pairs of nodes. The input for this algorithm is the W matrix. As was done in Warshalls algorithm, we develop the algorithm by computing a sequence of matrices. Each matrix stores the lengths of the shortest paths when the paths are restricted to having certain intermediate nodes. The objective is to compute matrix D, where dij stores the distance from i to j (i.e., the length of the shortest path in the graph from node i to node j ). We also dene D(k) to be the matrix of the lengths of the shortest paths when the set of intermediate nodes is
901
di(k-1) k
(k-1) dk j
di(k-1) j
Figure 19.40. Using k as an intermediate node in computing the distance matrix D(k) restricted to {1, 2, . . . , k }. The general denition is dij = the length of the shortest path from i to j such that the path can only use intermediate nodes from {1, 2, . . . , k } When k = 0, no intermediate nodes are allowed, and D(0) = W . Also, D(n) = D, since D(n) allows every node to be used as an intermediate node. As with Warshalls algorithm, D(k1) is used in the computation of D(k) . For a given pair (i, j ), the shortest path for D(k) will either use node k or not. The two paths to be considered are shown in Figure 19.40: (1) the shortest path from i to j only uses {1, 2, . . . , k 1} as (k1) , and (2) the shortest path from i to j consists of the shortest intermediate nodes, dij path from i to k by only using {1, 2, . . . , k 1} as intermediate nodes, dik , followed by (k1) the shortest path from k to j by only using {1, 2, . . . , k 1} as intermediate nodes, dkj . Thus, each element of D(k) can be computed by dij = min
(k ) (k1) (k )
dij
(k1)
, dik
(k1)
+ dkj
(k1)
As with Warshalls algorithm, there is no need for many matrices. Since the new value in a location is the minimum of itself and a new value, all the information can be stored in the same matrix. The resulting algorithm, known as Floyds algorithm, is shown in Figure 19.41. Rather than returning d, Floyds algorithm returns a second matrix, p, that is used to record intermediate node information. This information can be used to generate the sequence of nodes that form each minimum path. To see it, note that all elements in p are initially 0. When the procedure completes its task, p(i, j) contains the index of the node that caused a change to d(i, j). As a result, if p(i, j) = 0, the shortest path from i to j is a direct path along the edge connecting nodes i and j; otherwise, if p(i, j) = k, the shortest path from i to j passes through node k (i.e., it consists of the shortest path from i to k followed by the shortest path from k to j). Such a recursive specication allows one to reconstruct the entire path. We now give a partial trace of the procedure by using the simple weighted digraph of Figure 19.42, with its initial d and p matrices given in Figure 19.42a. The rst iteration uses node 1 as an intermediate node. The rst pair of nodes of interest is when i = 2 and j = 4. Since d(2, 4) initially has a value of INF, it is changed to d(2, 1) + d(1, 4) = 3 + 1 = 4. Also, p(2, 4) is set to 1 indicating that node 1 is an intermediate node in the path from node 2 to node 4. Similarly, d(3, 4) is set to d(3, 1) + d(1,4) = 5 + 1 = 6
902
Graphs
/** The shortest paths matrix computed by means of Floyds algorithm. Analysis: Time = O(n3), where n is the number of nodes in the graph */ [ ][ ] floydsAlg( [ ][ ] w) { [ ][ ] d = ( [ ][ ])w.clone(); [ ][ ] p = [w.length][w.length]; ( k = 1; k <= w.length; k++) ( i = 1; i <= w.length; i++) ( j = 1; j <= w.length; j++) (matrixItem(d, i, k) + matrixItem(d, k, j) < matrixItem(d, i, j)) { matrixSetItem(d, matrixItem(d, i, k) + matrixItem(d, k, j), i, j); matrixSetItem(p, k, i, j); } p; }
Figure 19.41. Floyds algorithm for computing the shortest paths of a graph
8
1
1 3 5 1
2
1
(
0 1 1 0 0 1 2 0
d 0 INF INF 1 3 0 1 INF 5 1 0 INF 8 1 4 0 (a) k
p 0 0 0 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0
((
( (
d p 0 INF INF 1 0 0 0 3 0 1 4 0 0 0 5 1 0 6 0 0 0 8 1 4 0 0 0 0 (b) k = 1
(( ((
((
d p 0 0 0 0 INF INF 1 0 0 0 3 0 1 4 2 0 0 4 1 0 5 2 0 2 4 1 2 0 (c) k = 2
((
0 1 2 0
d 0 INF INF 1 3 0 1 4 4 1 0 5 4 1 2 0 (d) k
p 0 0 2 2 = 0 0 0 0 3 0 0 0 2
p 3 1 0 4 1 4 0 0 0 5 2 0 2 0 2 0 (e) k = 4
( (
0 3 4 4
2 0 1 1
((
4 0 0 2
0 1 2 0
Figure 19.42. Trace of minimum path lengths in a weighted digraph with p(3, 4) = 1. The revised matrices for d and p at the end of the rst iteration are given in Figure 19.42b. In the second iteration, node 2 is used to shorten the 3 1, 3 4, 4 1, and 4 3 paths, as shown in Figure 19.42c. Node 3 is not useful to shorten any paths, but node 4 shortens paths from node 1, as shown in Figure 19.42d and Figure 19.42e. The information in the p matrix can be used to generate the nodes on each minimum path. For example, assume that we want to nd the nodes that lie on the minimum path
903
from node 1 to node 3. The element p(1, 3) = 4 in Figure 19.42e indicates that node 4 is an intermediate node between nodes 1 and 3. Because p(1, 4) = 0, there are no intermediate nodes between 1 and 4, so the edge from node 1 to node 4 is part of the path. Since p(4, 3) = 2, node 2 is an intermediate between node 4 and node 3. Finally, p(4, 2) and p(2, 3) are both 0 indicating that there are no more intermediate nodes. Therefore, the minimum path is the sequence of directed edges: (1, 4), (4, 2), (2, 3) . The timing of the procedure for determining the shortest paths in a graph of n nodes, using its weight matrix w, is easily seen to be (n3 ).
19.6.4
Single Source Shortest Paths Algorithm
Recall that in a graph, the length of a path is the sum of the lengths of the edges on the path, and the distance between two nodes is the length of the shortest path. Floyds algorithm nds the distance between every pair of nodes in time O(n3 ). If it is only necessary to nd the shortest paths from some specic node to every node, as is often the case, such paths can be determined more eciently. In particular, as long as all edge lengths are nonnegative, an algorithm developed by Edsger Dijkstra can be used. Suppose the length of an edge (u, v ) is given by w(u, v ), and the objective is to compute d(u), the distance from s to u, for every node u. The key to Dijkstras algorithm is to determine the distances from s in increasing order. Dijkstras algorithm is called a greedy algorithm because at each step it makes its choice of what to do next based on what appears to be the best alternative based on the local information. The algorithm does not attempt to consider the big picture of considering how all the shortest paths might interact. Instead, it simply nds the shortest path from the source, then the second shortest path, the third shortest, and so on, until all the shortest paths have been found. It turns out that this greedy algorithm does in fact nd all shortest paths from the source. However, many greedy algorithms do not nd the best solution to the problem that they are solving. Returning to our problem, rst note that the shortest path from s to itself is the empty path of length 0. For every other node v , the shortest path from s to v either goes directly from s to v , or else consists of a path from s to some node u, followed by the edge from u to v . In the second case, the path from s to u must be a shortest path from s to u; otherwise, a shorter su path would shorten the s to v path. Also, u must be closer to s than v , as we are assuming nonnegative edge lengths. In the general case, the shortest path to v ends with an edge into v from a node u closer to s, where u might be s itself. This leads to the idea of nding the paths in the order from shortest to longest, as each new path to some node consists of an existing shortest path followed by one edge to the new node. Suppose that the set of nodes is partitioned into the set C (for closest) and the set V C . Informally, C is a set of nodes whose distances from s are known to be smaller than for those nodes not in C . More formally, for the set C , it is required that every node in it has its distance already calculated, and also every node in it has a distance that is known to be at least as small as the distance for every node not in C . Initially, C = {s}. The next node to be added to C will be the node with the shortest edge connecting s to it. In general, the next node to be added to C , call it v , will be the next closest node to s, in terms of shortest path, of those not already in C . The shortest path from s to v will not pass through a node not in C , as all nodes in V C are at least as far from s as is v . Thus, the shortest path to v consists of a shortest path from s to some node u in C followed by the edge (u, v ). Therefore, for each x / C , dene td(x) = min{d(u) + w(u, x) | u C }
904
C {s} d(s) 0 C = V each x / C compute td(x) select v / C such that td(v) td(x), for all x / C C C {v} d(v) td(v)
Graphs
Figure 19.43. Greedy algorithm for single source shortest paths
b 9 a 6 10 e 3 (a) d 3 12 2 2 c 4 8 b(9) 9 12
a(0) d(6) 10 6 d(6) e(10) 2
a(0)
12 b(8) c(12)
3 e(9)
c(12)
(b) C = {a}
(c) C = {a, d}
a(0) d(6) b(8) 2 c(10) (d) C = {a, b, d} 3 e(9) b(8) 2 c(10) d(6)
a(0) d(6) e(9) b(8)
a(0)
e(9)
c(10) (f) C = V
(e) C = {a, b, c, d}
Figure 19.44. Trace of Dijkstras algorithm
where td stands for tentative distance. The node v closest to s will have td(v ) = d(v ), and its td value will be the smallest. Therefore, the node with the smallest td value can be moved into C and its d value assigned its td value. This completes one iteration, and the process can be repeated to add the remaining nodes. The resulting general algorithm is given in Figure 19.43. To see how this algorithm works, consider the graph in Figure 19.44a. An undirected graph was selected with a label on each edge to indicate its length. Assuming that the source is node a, Figure 19.44b shows node a in set C with label 0 its distance from a. The other nodes are in set V C and are labeled by their td values. The td value for a
905
node of V C is determined by an edge from the node to a node of C . In Figure 19.44b, these edges are shown dashed and labeled by their length. As Figure 19.44b shows, the node closest to a is d, so Figure 19.44c shows d added to set C and labeled by its distance. After adding d to C , the new values for td need to be calculated for nodes in V C . Figure 19.44c shows that cs shortest path from a is still the direct edge, whereas the shortest paths from a to b and to e go through node d. Now, node b is closest to a, so Figure 19.44d shows node b in set C . In this gure, it is seen that node c has found a better route to node a by rst going to node b. At this time, e is closest to a, so it is added next. Note that c is not added, even though its dashed edge has a shorter length, as e has a smaller td value. Finally, c is added to complete the trace of the algorithm. The result, given in Figure 19.44f, is a tree of shortest paths called the shortest paths tree. Like Prims and Floyds algorithms, in each iteration, we only consider paths that can have certain interior nodes. For Dijkstras algorithm, the start node is always s, and the interior nodes are restricted to be in C {s}. Each td value records the length of the shortest path of this form for a specic node of V C . As the previous argument explains, the node of V C with smallest td value can be moved to C , with its distance being set to its td value. The only nontrivial parts to the general algorithm given in Figure 19.43 are the calculation of the td values and the selection of v . If these parts are implemented in the obvious way, a needlessly inecient algorithm is obtained. To see this, consider the calculation of the td values. The outer loop is performed O(n) times, and most of the time both V C and C have size O(n) each has about half the nodes which results in time O(n3 ) to compute all the td values. We had hoped to do better, as this is the time bound for Floyds algorithm. The key to improving the algorithm is to note that we do not need to start over when computing the td values. Suppose that the td values are known for a certain set C and that node u is added to C . To update the td value for some node x, note that the only new path that might be a shortest path from x to s (as a result of adding u to C ) is the shortest path from s to u followed by the (u, x) edge. The length of this new path needs to be compared with the current value for td that is, td(x) min(td(x), d(u) + w(u, x)) Thus, td(x) can be updated in constant time instead of O(|C |) time, a signicant improvement. For example, when node b was added to C moving from Figure 19.44c to Figure 19.44d, the update of td(c) only needs to consider the path formed by adding the edge (b, c) to the shortest path to b. It did not need to reconsider the path to c through d, or the path directly from a, as the length of the best of them was already recorded in the td(c) value of Figure 19.44c. The algorithm can be started with C empty, provided that initially td(s) = 0 and td(x) = INF for x = s, where INF is short for innity or some other large value. Incorporating the better way to update td, the modied general algorithm, Dijkstras algorithm, is shown in Figure 19.45. With the new general algorithm, to update all of the td values for the addition of v to C , it is necessary to nd all the nodes adjacent to v . If the adjacency matrix representation is used, it will take time O(n) to nd these nodes. If the adjacency lists representation is used, the time is only O(degree(v )). Note that the sum of the degrees of the nodes is O(m), where m is the number of edges in the graph. Therefore, if the adjacency lists representation is used, the updates of all the td values for all the times through the loop are performed in
906
Graphs
td(x) INF td(s) 0 C C = V select v / C such that td(v) td(x), for all x / C C C {v} d(v) td(v) each x adjacent to v x / C td(x) min(td(x), d(v) + w(v,x))
each x V
Figure 19.45. Dijkstras algorithm for shortest paths time O(m). As the loop is performed n times and O(n) time is needed to select v each time through the loop, the overall time bound for the algorithm is O(n + n2 + m) = O(n2 ). A slightly dierent implementation yields a dierent time bound. In the preceding algorithm, the time is dominated by the selection of v (i.e., selection of the minimum item from a set). The data structure designed to eciently select the smallest item is a priority queue. Thus, it makes sense to store the nodes of V C in a priority queue ordered by their td values. If the priority queue is implemented by a balanced tree (see Section 11.2.2, or the heap data structure of Section 18.6), then insertion, nding the minimum, and deletion can all be done in time O(log n). Therefore, all the minimizations can be done in time O(n log n). However, this part of the algorithm no longer dominates the time required by the algorithm. When a node receives a new td value, the node must be shifted in the priority queue to a position that reects its new priority. This shift can require time O(log n), and nearly every edge in the graph might result in a td value being reduced. Hence, the time for all the updates to the td values is now bounded by O(m log n). This bound becomes the time bound for the current implementation. We now have two algorithms one with time bound O(n2 ) and one with time bound O(m log n). As the value for m can vary from O(n) to O(n2 ), neither algorithm is best for all input graphs. Of course, depending on the value of m, the appropriate choice can be made, so an O(min(n2 , m log n)) is obtained. The complete Java implementation of the algorithms is left to the problems. Note that in an actual implementation, the td eld is only used for nodes in V C and the d eld is only used for nodes in C . Therefore, there is no need for both elds, and both values can be stored in the d eld.
Problems 19.6
1. Obtain the adjacency matrix A of the digraph given in Figure 19.46. Find the elementary paths of lengths 1 and 2 from v1 to v4 . Show that there is also a simple path of length 4 from v1 to v4 . Verify the results by calculating A2 , A3 , and A4 . 2. Obtain the adjacency matrix A of the digraph in Figure 19.42 and calculate its path matrix. 3. For a simple digraph G = (V, E ), dene a matrix X by xij = k, where k is the smallest integer for which ak ij = 0 Determine the matrix X of the digraph given in Figure 19.46. What does xij = 1 mean?
907
v1
v2
v4
v3
Figure 19.46. A digraph for path calculations
1
4
2
2 1
3
4 5 1 2
4 4
10
2 1
10
(a)
(b)
Figure 19.47. Undirected graphs for distance calculations

4. Suppose we are given an adjacency matrix representation of a graph, and we wish to nd the following matrix: 8 <1 if there is a path from i to j subject to the restriction that only nodes C (i, j ) = 1, 3, 7, and 9 can appear on the path between nodes i and j :0 if no such path exists Assume that the nodes are labeled 1, 2, . . . , n. (a) Formulate an algorithm for computing the desired information. (b) Perform a timing analysis of the algorithm. 5. Use Warshalls algorithm on the graph given in Figure 19.46, and give the path matrix, P , after each iteration on the loop index k. 6. Obtain the distance matrix for the graph in Figure 19.47a by using function floydsAlg (Figure 19.41). Give a trace of your result in the same form as that given in Figure 19.42. Note that the graph is undirected, so the edge weight gives the distance to go in either direction along the edge. 7. Repeat the previous problem for Figure 19.47b. 8. In procedure floydsAlg(), a matrix p was used to record intermediate node information about the minimum distance paths between pairs of nodes in a graph. Formulate an algorithm
908
Graphs
1
1
2
3 2
3
1 2
6
6
5
2 1 8 4
(a)
7 (b)
Figure 19.48. Directed graphs for distance calculations
v2
3 1 3
v3
6 2
v1
1 6 4 3
v6
v4
v5
Figure 19.49. A digraph for maximum distance calculation

that, when given a pair of nodes, the matrix p, and the number of nodes in the graph n, generates a sequence of nodes that lie on a minimum distance path for the pair. 9. For each of the weighted graphs in Figure 19.48, use function floydsAlg() to compute the d matrix. Give the initial distance matrix d for the digraph, and the d matrix after each node k has been used as an intermediate node. 10. There are applications, such as those dealing with network scheduling, where the paths with largest weight sums are of interest. In such applications, the graphs are directed and acyclic (i.e., have no cycles). One approach to generating the desired information is to alter the function floydsAlg() discussed earlier in the section. The maximum weighted path information can be stored in a matrix maxDist. Explore this possibility and in particular, (a) What matrix should be used to represent the graph for the rst iteration in the computation of the matrix maxDist? (b) Using the representation of a graph determined in part (a) and the number of nodes in the graph (n) as inputs, formulate an algorithm for generating the maximum weighted path information that is to be stored in maxDist. (c) Trace the algorithm obtained in part (b) for the graph in Figure 19.49. First show the initial matrix maxDist. Also, give the snapshot of the maxDist matrix after each iteration of using node k as an intermediate node. 11. Give the code to implement the priority queue version of Dijkstras algorithm, using the adjacency lists representation of the graph.
Sec. 19.7. Traversals of Undirected Graphs
909
12. Give the code to implement the nonpriority queue version of Dijkstras algorithm, using the adjacency lists representation of the graph. 13. Give the code to implement the priority queue version of Dijkstras algorithm, using the adjacency matrix representation of the graph. What is the time bound for this implementation? 14. Give the code to implement the nonpriority queue version of Dijkstras algorithm, using the adjacency matrix representation of the graph. What is the time bound for this implementation?
19.7
Traversals of Undirected Graphs

In the previous section, we explored determining the existence of paths between pairs of nodes in a graph. A more general problem involves testing a graph for some specic property or obtaining some particular part of the graph. In addition, in certain applications, it is required to traverse (i.e., visit or process each node in) a graph in some particular order. This section focuses on two general methods for searching or traversing a graph breadthrst search (BFS) and depth-rst search (DFS) that arise in a variety of applications and can be used to test for many properties. This section will search undirected graphs and use the term vertex rather than node. The same techniques are applied to directed graphs, but interpretation of the results of a traversal is sometimes slightly dierent. Some of these dierences will be discussed later in the chapter.
19.7.1
Breadth-First Search
One of the main uses of the BFS is to nd paths from a specied starting vertex to the other vertices that use the fewest edges (i.e., for each vertex v of the graph, nd a path from the starting vertex to v such that the path has the minimum number of edges). This is a shortest paths problem in which each edge of the graph is given a weight of 1. The number of edges in a shortest path to a vertex v is called v s distance from the source. The key to doing the task is to traverse the vertices of the graph in the order of increasing distance from the start vertex, which is the traversal order of a BFS. Thus, a BFS rst nds vertices of distance 0 from the start, then distance 1, distance 2, and so on. The approach used in such a search is called the greedy approach, since in each iteration, it extends the current solution by a vertex closest to the starting vertex. The solid lines of Figure 19.50a show an undirected graph with its adjacency list representation given in Figure 19.50b. The dashed arrows show the traversal of the graph in a BFS starting at vertex v1 . The number inside each vertex of the diagram is its distance from v1 . The algorithm starts with v1 and nds the vertices at distance 1: v2 , v3 , v4 , and v5 and then the vertices at distance 2: v6 , v7 , v8 , and v9 . Now consider the general algorithm for a BFS when it starts at some vertex s. There is only one vertex of distance 0 from s namely, s itself, using the empty path from s to s. The vertices of distance 1 from s are the ones adjacent to s. They can easily be determined from the adjacency list of s. Each vertex of distance 2 from s is an unreached vertex that is adjacent to a vertex of distance 1 from s (distance 1 plus 1 edge yields distance 2). In Figure 19.50b, v6 has distance 2, since it is adjacent to vertex v2 , which has a distance of 1. Vertex v3 does not have distance 2, even though it is adjacent to vertex v2 , as it was previously reached and already has its distance determined. In general, the vertices u of distance i are the vertices adjacent to a vertex of distance i 1, such that u has not already
910
Graphs
v2 1 2
v6
v1
v3 1 v4 1
v7
v8
v5 1 2
v9
(a) Breadth-first search of a graph v1 v2 v3 v4 v5 v6 v7 v8 v9 v2 v3 v4 v5 v1 v3 v6 v1 v2 v4 v7 v8 v1 v3 v5 v8 v1 v4 v9 v2 v3 v3 v4 v5 v1 v2 v3 v4 v5
v6
v7
v8
v9
(b) Adjacency lists representation
(c) Shortest path tree
Figure 19.50. An example of a BFS been reached. Thus, to nd the vertices of distance i, it is only necessary to know the vertices of distance i 1 and to know which vertices have already been reached. Therefore, the algorithm starts with a list of vertices at distance 1 from s. From this list, a list of vertices of distance 2 is obtained. Now, from the distance 2 list, the distance 3 list can be obtained, 3 yields 4, 4 yields 5, and so on. The lists are built one by one as the algorithm progresses from nearer vertices to farther vertices. In Figure 19.50, the distance 1 list has vertices v2 , v3 , v4 , and v5 . The distance 2 list has vertices v6 , v7 , v8 , and v9 . However, a breadth-rst traversal requires only one list. This list can store at its front the vertices of distance i 1 that have not yet been processed. As each vertex is processed, it is removed from the list, and newly reached vertices (of distance i) are added to the end of the list. Thus, after processing all the vertices of distance i 1, it will automatically move on to process those of distance i. The reader may have noticed that the list just described is a FIFO queue. The algorithm starts by placing the starting vertex s in a queue and marking it reached. Now each iteration proceeds by removing the vertex at the front of the queue and nding the vertices adjacent to it that have not yet been reached. Each vertex found in this way is marked reached and added to the end of the queue. In summary, a general algorithm for the breadth-rst search approach to nding the distance from s to each other vertex is given in Figure 19.51. Initially, s is the only vertex
911
Procedure bfs(s) mark all vertices in the graph as not reached mark and visit s set the distance to s to be 0 put s in queue the queue is not empty remove the front element from the queue and call it the current vertex each neighbor of the current vertex the neighbor is not marked visit and mark the neighbor put neighbor in the queue calculate the distance for the neighbor
Figure 19.51. General algorithm for a BFS from s that is marked reached, and it is the only vertex in the queue. When the rst vertex of the queue is removed, all vertices adjacent to it its neighbors are considered. Any neighbor that is not marked does not already have a shorter path, so its shortest path is through the current vertex. Hence, its shortest path is calculated, and the current vertex is marked and added to the queue. The only part that is not described in detail is how to calculate the dist value of a vertex, which will be done after tracing the traversal. For the graph in Figure 19.50, we now trace a BFS where v1 is the starting vertex. Vertex v1 is visited rst to discover the vertices that are adjacent to it (i.e., those of distance 1 from v1 ). Because the vertices in the adjacency list for v1 are ordered as shown in Figure 19.50(b), the vertices adjacent to v1 are placed in the queue in the following order: v2 , v3 , v4 , v5 Since v2 was the rst vertex adjacent to v1 to be discovered and saved in the queue, we next explore the vertices adjacent to v2 . There are three of them, v1 , v3 , and v6 , but the only one not marked is v6 . Therefore, v6 is the only vertex added to the queue when v2 is processed. namely, v3 , which results The process is repeated on the second vertex adjacent to v1 in the discovery of vertices v7 and v8 that are placed in the queue. Note that vertices v1 , v2 , and v4 were eliminated as they were marked when their shortest paths were discovered earlier. We next explore the unvisited vertices of v4 . There are none. Finally, the only unvisited vertex adjacent to v5 is v9 . The BFS will continue to visit the remaining vertices in the queue, but no unvisited vertices will be found. The search completes when the queue becomes empty. The BFS strategy results in the traversal indicated by the dashed arrows in Figure 19.50a. The vertices of the graph were visited and placed in the queue in the following order: v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 , v9 Note that when a vertex v is rst reached, say from a vertex u, v should be marked as reached. If the objective is to nd the distance of each vertex from s, then v s distance is 1 more than us distance. This provides the way to calculate dist values; if v is rst reached when it is found to be adjacent to vertex u, the distance of v is 1 more than the distance for u. During a BFS of a graph, the edges of a shortest path tree are followed. For example, a shortest path tree with root v1 for the example graph in Figure 19.50a appears in Figure 19.50c. The vertices in the tree are searched in increasing level number order and within each level from left to right, as per a level-order traversal. The implementation of this algorithm follows the general algorithm very closely. As the vertices need to be marked as reached when they are discovered, vertices need a reached eld. Recall that Section 19.5.3 denes vertices with a reached eld, and shows how to
912
Graphs
create a graph with such vertices. Also, if the distances from the start vertex are to be calculated, it makes sense to have a dist eld for each vertex and a method setDist() to change the value of the dist eld. Assuming that this eld and method exist for vertices of type DistVertex, a descendant of SearchVertexUos, Figure 19.52 gives the methods for the graph class to do a BFS from vertex s and compute dist. The methods setAllUnreached() and bfs() use the vertex iterator and the edge iterator of the graph class, respectively. To prevent the side eect of changing the clients current position, the graphs current position is saved at the start of bfs() and restored at the end. Also, a LinkedQueueUos<DistVertex> object is used for the queue of vertices. If the two statements s.setDist(0) and adjDistVertex.setDist(currentVertex.dist()+1) are removed and the vertex type is SearchVertexUos, the generic BFS is obtained. The generic methods for a BFS have been placed directly in the SearchGraphUos class of dslib. In performing a BFS, which representation is best used for the graph, (i.e., which container should be used to store the edges of the graph)? To answer this question, examine the frequency of use for the two common operations: adjacent (v1,v2) is not used at all, whereas for each vertex, it is necessary to nd the incident edges. Table 19.2 on page 893, which compares the representations, indicates that the adjacency lists representation is preferable. Hence, the class SearchGraphLinkedRepUos or a descendant of it should be used. Let us examine the timing analysis of procedure bfs when applied to a connected graph. Continuing with the notation for the ADT of a graph, n and m denote the number of vertices and the number of edges in the graph, respectively. Steps 1 to 6, as labeled by the comments, are executed once. The time for step 2 is O(capacity ), whereas the others take statement in step 13 are each executed n 1 constant time. The statements within the times (i.e., once for every vertex, except s). This follows from the fact that these statements are executed only when the reached value of a vertex is false, and the reached value is set to true in the rst of these statements. Since these statements are done n 1 times, n 1 vertices are added to the queue in step 16. Thus, one vertex in step 6 and n 1 vertices in step 16 are inserted into the queue for a total of n vertices. Consequently, step 7 is repeated n + 1 times (counting the time that the queue is empty), and steps 8, 9, and 10 are done n times. For an undirected graph with m edges, the adjacency lists contain 2m edges/vertices. It then follows that steps 12, 13 and 17 are each performed a total of 2m times and that step 11 is done n + 2m times. Therefore, the time analysis for the procedure shows that the time requirement is O(capacity + m). Often capacity = n, so the time for the procedure bfs is O(n + m). What if the adjacency matrix representation were used? Then the time bound would be O(n capacity ), since for every vertex a row of the matrix must be scanned. If m (n2 ), there is little dierence. However, if m O(n) (as it may well be), with adjacency lists representation the time for bfs is in O(n). In contrast, the adjacency matrix representation requires time O(n2 ), which can be an important dierence for large n. Recall that a graph is called connected if there is a path between every pair of vertices. The BFS approach will visit every vertex of an undirected graph, regardless of the start vertex, if the graph is connected. If the graph is not connected, procedure bfs will not visit each vertex. However, the procedure bfs can be modied to determine whether a graph is connected. This modication is left as an exercise.
Problems 19.7.1
1. Trace the BFS algorithm on the graph of Figure 19.50 when the test for
913
/** Set all the vertices to unreached. Analysis: Time = O(n), where n = number of vertices */ setAllUnreached() { GraphPositionUos position = currentPosition(); goFirst(); (!after()) { item().setReached(false); goForth(); } goPosition(position); } /** Breadth-rst search of the graph to calculate distances. Analysis: Time = O(n + m), where n = number of vertices and m = number of edges */ bfs(DistVertex s) { GraphPositionUos position = currentPosition(); // step 1 setAllUnreached(); // step 2 // step 3 LinkedQueueUos<DistVertex> q = LinkedQueueUos<DistVertex>(); s.setReached(true); // step 4 s.setDist(0); // step 5 q.insert(s); // step 6 (!q.isEmpty()) // step 7 { DistVertex currentVertex = q.item(); // step 8 q.deleteItem(); // step 9 eGoFirst(currentVertex); // step (!eAfter()) // step { DistVertex adjDistVertex = adjItem(); // step (!adjDistVertex.reached()) // step { adjDistVertex.setReached(true); // step adjDistVertex.setDist(currentVertex.dist() + 1); // step q.insert(adjDistVertex); // step } eGoForth(); // step } } goPosition(position); } 10 11 12 13 14 15 16 17
// step 18
Figure 19.52. Code to perform a BFS and compute the distances from s
(!adjDistVertex.reached()) is omitted and the statements within it are always executed. What happens? 2. Given an undirected graph that is not necessarily connected, give an algorithm based on the BFS approach to test if a graph is connected. 3. Using the graph of Figure 19.50, show that if the order of the vertices is changed in the adjacency lists, the shortest path tree obtained may be dierent. 4. Modify the breadth-rst algorithm to form for each vertex a linked list of the incident edges of the shortest path tree found by the algorithm. Also, store with each vertex its parent vertex in the shortest path tree.
914
Graphs
5. For an undirected graph, its diameter is the distance between the two vertices that are farthest apart (i.e., the length of the longest path, which is a shortest path between two vertices). Give an algorithm to calculate the diameter of an undirected graph. It is suggested that you use an approach that employs several BFS. What is the time complexity of your algorithm?
19.7.2
Depth-First Search
Suppose that a person is placed in some interconnected system of caves, or at some given intersection in a maze, with the objective to nd his or her way out. This problem can be modeled as a graph problem with a vertex for each intersection of two or more passageways and an edge for each passageway connecting two intersections. Thus, the problem is one of nding a path from a specic start vertex in a graph to another specic vertex the exit. Several approaches can be used in this search. One approach that would probably not be used is a BFS. A breadth-rst strategy only goes one edge forward before backtracking tracing an edge backward that has already been traced forward to try other alternatives. Intuitively, such an approach does not appear to be that promising as it retraces edges backward far too often. A much more likely strategy is to begin at the start point and follow a path as long as possible, only backtracking when necessary. In particular, when starting out, select an edge, say, the rightmost one, and follow it to the next intersection. When arriving at a new intersection, the rightmost branch is again taken, and so on, until the desired vertex is found or you reach an intersection that was reached before. In the latter case, it is possible to continue from there, but this makes it dicult to know what part of the graph has been investigated. When a vertex is reached that has been previously reached, a better approach is to backtrack to the previous intersection. After going back on the edge to the previous intersection, select another branch that has not been investigated yet. If all branches at this intersection have been explored, again backtrack to the next previous intersection, and so on. Now, continue in the same fashion until reaching the destination or until there is nowhere new to investigate. The forward and backtrack moves are essentially the Depth-First Search (DFS) method that we now describe and study in detail. A DFS can be used to perform a traversal of a general graph. It works by always trying to extend the current path to reach new vertices. As each new vertex is reached, it is marked so that it is identied as no longer new. The general DFS strategy is as follows: First, the start vertex s, which was passed in as an argument, is marked. The initial current path is the length-zero path from s to itself. An unmarked vertex adjacent to s is now selected, marked, and added to the path. It becomes the new current vertex for a recursive call. Note that the start vertex may be left with unexplored edges for the present. The search continues by selecting an unmarked vertex adjacent to the current vertex. The current path ends when the current vertex has all adjacent vertices already marked. Then the search removes the current vertex from the end of the current path and returns to the previous vertex (i.e., it backtracks). At the previous vertex, the search moves to an unmarked vertex if one exists, or else backtracks again. This process is continued in a recursive manner until all vertices are marked. A general algorithm for a DFS from vertex s consists of a main method that calls a recursive scan procedure, which is shown in Figure 19.53. The dashed arrows of Figure 19.54 show a DFS starting at v1 for the graph given in Figure 19.50. We now trace the DFS. The algorithm starts with vertex v1 being marked and visited. The rst vertex in the adjacency list for v1 is v2 . We mark and visit this vertex. Now continuing the search from v2 , the rst vertex in the adjacency list for v2 is v1 , but the
915
Procedure dfs(s) mark all vertices in the graph as not reached invoke scan(s) Procedure scan (s) mark and visit s each neighbor w of s the neighbor w is not reached invoke scan(w)
Figure 19.53. General algorithm for a DFS from s
v2
v6
v1
v3 3
v7
v4 4 7
v8
v5
v9
Figure 19.54. The DFS traversal of a simple connected undirected graph vertex has already been visited. Consequently, the next vertex to be marked and visited is v3 , as it is the next vertex adjacent to v2 . The search is now continued from vertex v3 , where it is discovered that the rst two vertices in its adjacency list, v1 and v2 , have already been visited. Therefore, the next vertex in v3 s adjacency list, v4 , is the next vertex to be marked and visited in the search. Continuing from v4 , since the rst two vertices in the adjacency list for v4 have been visited, the next vertex to be marked and visited is v5 . The rst unvisited vertex in the adjacency list for v5 is v9 . We then mark and visit it. At this point, there are no new vertices adjacent to v9 to be explored. We return to the next most recently visited vertex v5 and nd that all of its adjacent vertices have been visited. We then return to vertex v4 and nd that vertex v8 has not been visited. We next mark and visit v8 . As it has no unmarked neighbors, we return to v4 . As there are no more unmarked vertices adjacent to v4 , we return to vertex v3 , which results in the marking and visiting of v7 . Because neither v7 nor v3 have unmarked neighbors, a return to v2 occurs, which results in the marking and visiting of vertex v6 . Finally, a return to v1 occurs, and the search ends, since it has no unmarked neighbors. The graph traversal is shown in Figure 19.54, in which the dashed arrows show a trace of the vertices visited and the number in each vertex indicates the order in which the vertices were reached. In particular, v1 and v6 were the rst and last vertices to be visited, respectively. The sequence of visited vertices is v1 , v2 , v3 , v4 , v5 , v9 , v5 , v4 , v8 , v4 , v3 , v7 , v3 , v2 , v6 , v2 , v1 In this sequence, a vertex is repeated when the algorithm backtracks to it to nd other
916
/** Depth-rst search of the graph. Analysis: Time = O(n + m) @param v The start vertex for the depth rst search */ dfs(V v) { setAllUnreached(); scan(v); } /** Scan v and all of its dfs descendants (used for dfs). Analysis: Time = O(n + m), where n = number of vertices, m = number of edges @param v The vertex whose adjacency list is to be scanned */ scan(V v) { CursorPositionUos position = currentPosition(); v.setReached(true); eGoFirst(v); (!eAfter()) { (!adjItem().reached()) scan(adjItem()); eGoForth(); } goPosition(position); }
Graphs
Figure 19.55. Methods for performing a DFS of a graph alternatives. As already stated, we must mark vertices in a DFS. If it is not done, a graph that contains a cycle would cause the algorithm to go into an innite loop! As was the case in a BFS, marking a vertex prevents a vertex from being revisited. Compared with the BFS strategy, which goes for breadth, the DFS attempts to go as far as possible from the start vertex. Since the vertices adjacent to a given vertex are needed during the traversal, again, the graph representation that yields the most ecient algorithm is the adjacency lists representation. The class SearchGraphLinkedRepUos will again be used. The recursive procedure to perform the DFS of a graph is shown in Figure 19.55. The implementation follows the general algorithm, using the same setAllUnreached() procedure as for the BFS, and using the edge iterator. Note that the current position in the graph is saved before starting an edge iteration and restored after the iteration. To see why this is needed, note that the traversal starts with the graph edge iterator traversing the adjacency list of s, and the algorithm does a recursive call to another vertex. At that vertex, an edge iteration is also needed. However, there is only one graph object and only one edge iterator within it. If a client attempts to use two or more iterators at the same time, they will interfere with each other the second one will lose the state of the rst one. Therefore, it is necessary to save the current position before starting a new edge iteration, and restore it after the iteration is complete. If a client needs to have two iterations active at the same time, a shallow clone of the graph should be obtained. Then, the client can use one iterator in the original graph and one in the clone. Now, consider the time required to do the search. Since the procedure scan() is recursive, its timing analysis requires the use of a technique for recursive methods. We use the approach of determining the time when recursive calls are ignored. Thus, let T Rscan (v ) be
917
the time for the scan() procedure with parameter v ignoring the time for recursive calls. Then T Rscan (v ) = degree(v ), as the procedure considers each incident edge and we are assuming an adjacency lists representation. We next determine what calls are made of procedure scan(). As soon as scan() is called for a vertex v , v is marked reached. Also, calls to scan() are only made for unreached vertices. Thus, scan() is called at most once for every vertex. Therefore, the total time is given by Tdfs = time for setAllUnreached + time for all calls to scan = capacity +
all calls to scan
time for each call (ignoring recursion) T Rscan (v )

v V
= capacity + = capacity +
v V
degree(v )
= capacity + 2 m Note that in an undirected graph, summing the degrees of all the vertices counts each edge twice; therefore, the value of the summation is 2 m. Assuming that capacity = n, the worst-case time complexity for DFS is O(n + m). Just as a BFS can be used to nd a shortest path tree, a DFS can be used to nd a tree called the DFS tree. The tree consists of the vertices reached by the search, and the edges encountered in the search that extend from a vertex being scanned to an unreached vertex. In Figure 19.54, the dashed edges show the edges of the DFS tree. A DFS tree tends to be long and stringy, whereas a BFS tree tends to be short and bushy. Note that for both types of searches, the tree obtained depends on the representation of the graph. In particular, if the order of the vertices is changed in the adjacency lists, then likely a dierent tree will be obtained. Depending on the application, such dierences in the trees might not be important. Cycles play an important role in graphs; in particular, there are applications where cycles are not allowed. For example, in a scheduling network as described in Section 19.1, a sequence of vertices that forms a cycle is forbidden. Also, a tree is a special kind of graph in which cycles are not allowed. Finally, in many applications where cycles are permitted, it is important to be able to nd them. Thus, in many situations, it is important to be able to test whether a cycle exists and, if so, to nd one. The DFS approach is very useful for nding a cycle in a graph, so we now examine this problem. Given a directed graph, our objective will be to nd a cycle containing a specied vertex s. Note that if a DFS is started at s, a path is found from s to p, and if an edge is found from p to s, a cycle has been found that contains s. The cycle consists of the search path from s to p, followed by the edge (p, s). If no such path and edge are found, no cycle exists that contains s. A function for doing a cycle test, isInCycle(s), is given in Figure 19.56. This function and the method that it calls, scanForPath(), are very similar to the procedures dfs() and scan() given in Figure 19.55. This time, however, the scan stops when it gets back to the start vertex (i.e., it nds a cycle). The method saves and restores the graph position in order not to destroy the state of any ongoing iterations. Observe that such a procedure only detects a cycle that involves vertex s. In general, a graph can contain many cycles. The detection of all cycles in a graph is left as a problem. Another
918
Graphs
dslib.graph.*; dslib.base.CursorPositionUos;
/** A directed graph with a function to determine whether a specied vertex is within a cycle. */ CycleGraphLinkedRepUos GraphLinkedRepUos<SearchVertexUos, EdgeUos<SearchVertexUos>> { /** Construct a new directed graph with capacity for up to cap vertices. Analysis: Time = O(cap) */ CycleGraphLinkedRepUos( cap) { ("dslib.graph.SearchVertexUos", "dslib.graph.EdgeUos", cap, "directed"); } /** Is there a cycle that includes s? Analysis: Time = O(n + m) */ isInCycle(SearchVertexUos s) { setAllUnreached(); scanForPath(s, s); } /** Set all the vertices to unreached. Analysis: Time = O(n), where n = number of vertices */ setAllUnreached() { GraphPositionUos position = currentPosition(); goFirst(); (!after()) { item().setReached(false); goForth(); } goPosition(position); } /** Test for a path from p that ends with an edge back to the origin vertex. Analysis: Time = O(n + m), where n = number of vertices, m = number of edges */ scanForPath(SearchVertexUos p, SearchVertexUos origin) { CursorPositionUos position = currentPosition(); found = false; p.setReached(true); eGoFirst(p); (!found & !eAfter()) (!adjItem().reached()) { /* Continue the search from the unreached adjacent vertex. */ found = scanForPath(adjItem(), origin); (!found) eGoForth(); } (adjItem() == origin) found = true; /* As adjacent vertex is reached, it has already been unsuccessfully searched. */ eGoForth(); goPosition(position); found; } /** A unique class identier for serializing and deserializing. */ serialVersionUID = 1l; }
Figure 19.56. A graph class with a function to detect a cycle for a vertex
919
important aspect of detecting cycles is to determine and display the vertices that lie on a cycle. Function isInCycle() can be altered to generate the additional information. Note that the graph class given in the gure only has the constructor for an empty graph. If the other constructors are desired, they need to be added. The detection of a cycle for s in an undirected graph is basically the same, except that the path s, p, s does not qualify as a cycle. The modication of the cycle detection methods to detect this situation is left to the problems. We shall now show how a simple digraph can be used to represent the resource allocation status of an operating system. This example is chosen because of the fundamental importance of the operating system in almost every conceivable computer system. Furthermore, cycle detection and elimination is necessary to ensure smooth operation of an operating system. In a multiprogrammed computer system, it appears as if several programs are executed at one time. In reality, the programs occupy dierent parts of memory and take turns using the resources of the computer system, such as the central processing unit, printers, tape units, and disk devices. A special set of programs called the operating system controls the allocation of these resources to the programs. When a program requires the use of a certain resource, it issues a request for that resource, and the operating system must ensure that the request is eventually satised. It is assumed that all resource requests of a program must be satised before the program can complete execution. If any requested resources are unavailable at the time of the request, the program obtains control of those resources that are available, but must wait for the unavailable resources. Let Pt = {p1 , p2 , . . . , pm } represent the set of programs in the computer system at time t. Let At Pt be the set of active programs, or programs that have been allocated at least a portion of their resource requests at time t. Finally, let Rt = {r1 , r2 , . . . , rn } represent the set of resources in the system at time t. An allocation graph Gt is a directed graph, representing the resource allocation status of the system at time t. It consists of the set of nodes V = Rt Pt and a set of edges E . There is a directed edge from node pi to rj if program pi is waiting for resource rj . In addition, there is a directed edge from rk to pl , if resource rk is presently held by program pl . For example, let Rt = {r1 , r2 , r3 , r4 }, At = {p1 , p2 , p3 , p4 }, and let the resource allocation status be as follows: p1 p2 p3 p4 requires r1 has resources r3 and r4 , and requires r2 has resource r2 and requires r1 and r3 has resource r1 and requires r4
Then, the allocation graph at time t is given in Figure 19.57; observe that this graph is a bipartite graph. It can happen that requests for resources are in conict. For example, in Figure 19.57, program p2 has control of resource r3 and requires resource r2 , but program p3 has control of resource r2 and requires resource r3 . In this case, both programs are sitting and waiting for the other to release a resource, so neither is accomplishing anything. Such a state in a computer system is called deadlock. The nature of deadlock is best explained through graph theory; rst of all, a program x must wait for program y to complete if y holds a resource that x needs (i.e., there is a path from x to a resource held by y ). A sequence of waiting programs can be longer, as x might be waiting for a resource held by z , whereas z is waiting for a dierent resource held by y . In general, x must wait for y to release a resource
920
Graphs
p1
r1
p2
r2
p3
r3
p4
r4
Figure 19.57. Allocation graph for detecting deadlocks if there is a path from x to y . If x waits for y and if y waits for x that is, if there is a cycle involving x and y there is then a deadlock. The cycle is p2 , r2 , p3 , r3 , p2 . Another example of deadlock in Figure 19.57 is the cycle p4 , r4 , p2 , r2 , p3 , r1 , p4 . In summary, the key to deadlock detection is cycle testing, and deadlock removal is cycle breaking. The DFS algorithm just completed is useful for performing these tasks.
Problems 19.7.2
1. Trace the DFS algorithm on the graph of Figure 19.54 when the test for (!adjItem.reached()) is omitted and the statements within the are always done. What happens?
2. Trace the DFS algorithm on the graph of Figure 19.54 when the statement goPosition(position) is omitted. Be sure to keep track of where the edge iterator is positioned in the graph. What happens? 3. Given an undirected graph, which is not necessarily connected, give an algorithm based on the DFS approach to test whether a graph is connected. 4. Suppose in the DFS of Figure 19.55, the adjacency matrix representation is used instead of adjacency lists representation. What is the time requirement of the algorithm now? 5. Suppose in the BFS of Figure 19.52, a stack is used instead of a queue. Is the resulting search similar to a search that you have seen before? Is it exactly the same? 6. Using the graph of Figure 19.54, show that if the order of the vertices is changed in the adjacency lists, the DFS tree obtained may be dierent. 7. Modify the DFS algorithm to form a linked list of the edges of the DFS tree found by the algorithm. Also, store with each vertex its parent vertex in the DFS tree, and compute its distance from the root in the tree. 8. Modify the cycle detector methods to store the actual cycle when it exists. Use a linked list to store the sequence of vertices that forms the cycle. 9. Modify the cycle detector methods to nd a cycle in an undirected graph that contains s. Be careful to avoid just tracing forward and backward on the same edge, as this does not form a cycle in an undirected graph.
Sec. 19.8. Applications
921
1 3 4
6 2 5 8
10
Figure 19.58. A disconnected digraph for searching

10. What is the minimum number of edges that must be removed in Figure 19.57 to break all cycles? Which edges will do this? 11. Consider doing a DFS of a connected undirected graph: (a) Give an argument to show that when an edge of the graph is rst considered either (i) it joins the current node to an unreached node, or (ii) it joins the current node to a reached node on the path of the DFS tree from the original node to the current node. In particular, show that it will not join the current node to a node in some other branch of the DFS tree or to a descendant in the DFS tree. Note that this applies only when the edge is considered the rst time. (b) In case (i) of part (a), the edge is classied as the tree edge, and in case (ii), it is classied as a back edge. Give a variation of the DFS algorithm to classify each edge of an undirected graph as a tree edge or a back edge. Be careful not to classify any edge twice. 12. Trace a DFS of the graph in Figure 19.58. In the dfs() procedure, start by making all nodes unreached and then iterate through all the vertices in index order. When the iteration gets to a vertex not yet reached in a search, do a scan from it. As the trace is done, draw the forest (a collection of trees) formed by including each edge from a current node to an unreached node. Why is a forest obtained rather than a single tree or a graph? 13. Consider doing a DFS of a directed graph. When an edge of the graph is considered in a directed graph, either (i) it joins the current node to an unreached node, (ii) it joins the current node to a reached node on the path of the DFS tree from the original node to the current node, (iii) it joins the current node to a reached node on a previous branch of the DFS tree or to a node in a previous tree, or (iv) it joins the current node to a descendant in the DFS tree. These edges are called tree edges, back edges, cross edges, and forward edges, respectively. Give a variation of the DFS algorithm to classify each edge of a directed graph into one of these four types. 14. Consider doing a BFS of an undirected graph. What classes of edges are encountered? See the earlier problem for some classes of edges. Dene and add any new classes needed. Give a variation of a BFS to classify each edge when it is rst encountered.
19.8
Applications
This section will look at a few applications of graphs. Often, a node represents some entity, and a path represents a form of connection between the entities at the ends of the path; for example, a highway route or an e-mail transmission route. The rst subsection is concerned with which pair of nodes are connected (i.e., the existence of a path from the rst item of the pair of nodes to the second item of the pair). Connectivity concepts are presented along with algorithms to determine graph connectivity. The second subsection discusses the
922
Graphs
4 11
10
Figure 19.59. Example of a disconnected graph selection of a minimum number of edges so as to connect a set of nodes. When the edges must be selected from an underlying graph, the nodes and edges form a spanning tree. If the edges have weights and the objective is to minimize the sum of the weights for the edges selected, then the result is a minimum spanning tree. Algorithms are given for both these problems. The third subsection deals with a special ordering of the nodes of a directed graph with no cycles. Such an ordering, for example, to nd the longest path in the graph, makes it easy to determine a number of properties of this type of graph. The fourth subsection studies scheduling the activities of a project, as in Example 19.6 on page 845. The scheduling process results in a direct graph without any cycles. By using the special ordering of the third subsection, an ecient algorithm is obtained to determine the time constraints on completing each activity to complete the project in the minimum time. The last subsection considers the use of graphs in testing software code. It presents the techniques of basis-path and data-ow testing.
19.8.1
Connectivity and Components
The concept of connectedness of nodes or vertices in a graph was introduced in Section 19.4. This section will rst review that concept and then consider some related algorithmic issues. First, recall the denition of connected in an undirected graph: Denition 19.12: An undirected graph is said to be connected if for any pair of nodes of the graph, the two nodes are reachable from one another (i.e., there is a path from one of the nodes to the other node). The preceding example of a disconnected graph is repeated in Figure 19.59. For an undirected graph, two nodes are said to belong to the same connected component if one can reach the other. Each connected component is a subgraph of the graph that is as large as possible, subject to the constraint that each node of the component can reach any other node of that component. Thus, the graph in Figure 19.59 has the following four connected components: G1 G2 G3 G4 = ({1, 2, 7, 8}, {{1, 2}, {1, 7}, {2, 8}, {7, 8}}) = ({3, 9}, {{3, 9}}) = ({4, 5, 10, 11}, {{4, 5}, {4, 10}, {4, 11}, {5, 10}, {5, 11}, {10, 11}}) = ({6}, { })
If either a BFS or a DFS is started at a node of an undirected graph, the search will reach all nodes in the connected component of the node. In this way, the connected component of a node is easily found. All connected components of a graph can be found by the following general algorithm, based on a DFS:
923
3 1 2
5 6 4 7
Figure 19.60. A disconnected directed graph

mark all vertices in the graph as not reached each vertex v of the graph v is not reached invoke scan(v)
Note that the reached eld is not reinitialized after each scan. A scan is done from an unreached vertex and, as such, a vertex has not yet been placed in a component. The time for this process is proportional to the number of nodes and edges in the whole graph. The connectivity concept is a little more complex in a directed graph, since even if a node u is reachable from node v by a directed path, the node v may not be reachable from u by a directed path. There are actually three concepts of connectedness in a directed graph. Given two nodes, if there is a path from each one to the other, then they are called strongly connected, which is equivalent to saying there is a cycle that contains them both. If one of them can reach the other, but the other cannot necessarily reach the rst one, they are called unilaterally connected. Finally, two nodes are called weakly connected if they are connected in the undirected graph formed by replacing each directed edge by the corresponding undirected edge. Observe that, if a pair of nodes is unilaterally connected, they are weakly connected. But a weakly connected pair is not necessarily unilaterally connected. Furthermore, a strongly connected pair is both unilaterally and weakly connected. Denition 19.13: A subgraph G1 is said to be maximal with respect to some property if the subgraph cannot have vertices or edges added to it and still have the property (i.e., no other subgraph of the whole graph has the property and also includes the G1 subgraph). For a simple digraph, a maximal strongly connected subgraph is called a strongly connected component. Similarly, a maximal unilaterally connected subgraph is called a unilaterally connected component, and a maximal weakly connected subgraph is called a weakly connected component. For the digraph given in Figure 19.60, {1}, {2, 3, 8}, {4}, and {5, 6, 7, 9} are the strongly connected components. The unilaterally connected components are {1, 2, 3, 8}, {4, 2, 3, 8}, and {5, 6, 7, 9}. The weakly connected components are {1, 2, 3, 4, 8} and {5, 6, 7, 9}. Note that every node of the digraph lies in exactly one strongly connected component. But an edge e E of a digraph may or may not be contained in a strongly connected component. If e = (u, v ) and both u and v are in the same strongly connected component S , e is in the strongly connected component. For the weakly connected components of a digraph, every node and edge is contained in exactly one weakly connected component. Finally, every node and edge is in at least one unilaterally connected component of a graph. As the preceding example shows, it is possible for nodes and edges to belong to more than one unilaterally connected component. The determination of the weakly connected components of a directed graph is easy
924
Graphs
convert the digraph to an undirected graph and nd its connected components. A search from a node will nd a unilaterally connected subgraph that contains it, but it will not necessarily nd the whole unilaterally connected component for the node. A carefully constructed DFS from a node can be used to nd the strongly connected component for the node. To do the task, it is necessary to determine those nodes that have a path back to the originating node. The key to this search is for each node v to keep track of how close to the root of the DFS tree can be reached by a special type of path from v . Such a special path can follow some edges down the DFS tree from v , and then it must have as its last edge an edge not in the DFS tree to an ancestor of v in the tree. There may be no paths of this type, but if they exist, it is useful to know how close to the root they can reach. The investigation of the algorithm is left as an exercise.
Problems 19.8.1
1. Give an algorithm based on a DFS to print out the nodes of each connected component of an undirected graph. Include with each component a count of the number of nodes in the component. 2. Find the strongly, unilaterally, and weakly connected components of the directed graph in Figure 19.58. 3. Give an algorithm based on a DFS to determine the strongly connected components of a directed graph. What is the time bound for your algorithm?
19.8.2
Spanning Trees
Like a graph, a tree consists of nodes and edges. As described in Chapter 13, access to the nodes in a tree is via its root and paths leading from the root. Thus, edges are followed away from the root. Therefore, even though we did not use directed edges, we treated the edges of a tree as if they were directed away from the root. Hence, as described in Chapter 13, trees are really a special case of directed graphs. Because of the special role of the root, these trees are called rooted trees. When doing a BFS, the shortest paths tree found during the search is a rooted tree. Similarly, the DFS tree found in a DFS is a rooted tree. A graph can have many other rooted trees. As described in Section 13.4, when there is no restriction on the number of children of a node, a tree is called a general tree or an m-ary tree. In that section, two representations were presented for general trees. The rst was based on storing an array of subtrees at each node. If all these arrays are put together, they form a matrix similar to an adjacency matrix. The matrix formed is the adjacency matrix for the directed graph corresponding to the rooted tree. Of course, the directed graph allows access from any node, not just the root. The other standard representation of a general tree was by transforming it into a binary tree. In the binary tree, the left reference of a node is to the rst child of the corresponding node of the general tree, and the right reference of a node is to the next sibling of the corresponding node of the general tree. If from a node a left reference is taken and then a sequence of right references are taken, the sequence is like a linked list of the children of the node in the general tree. The linked list is similar to the adjacency list of a node in the adjacency lists representation of the directed graph corresponding to the rooted tree. Thus, the binary tree representation of a general tree is like the adjacency lists representation
925
of the corresponding graph. Again, the graph representation allows access to any node, whereas the general tree representation only allows access to the root. A connected undirected graph with no cycles is called a free tree. A free tree is usually stored by using the standard adjacency matrix or adjacency lists representation for the graph. Of course, the tree found in a BFS or a DFS can be stored as a free tree. As usual, the representation that should be selected is the one that best ts the application. Both the shortest paths tree and the DFS tree of a connected undirected graph are special cases of what is known as a spanning tree. A spanning tree of a connected undirected graph is a subgraph that includes all the nodes of the graph, is connected, and has no cycles. Thus, it is a tree that has edges from the graph and spans all the vertices of the graph. Any given graph may have many dierent spanning trees. Which one is best depends upon the application; in some cases, shortest paths from the root are wanted, so the shortest paths tree is used. In other cases, long paths are wanted, so the DFS tree might be used. Other algorithms can be developed to yield other spanning trees. Note that if the graph is not connected, then a spanning tree can be found for each connected component of the graph. Many directed graphs do not have any spanning rooted trees. To have a spanning rooted tree, the directed graph must have a node, which will be the root, from which all other nodes can be reached. When this condition is satised, a BFS, a DFS, or some other means can be used to nd a spanning tree. There are many applications that can be modeled by graphs in which each edge has an associated weight (label). For example, in an airline application, the nodes are cities, and the weighted edges may denote the cost of ying an airplane between pairs of cities. In a computer network application, the nodes may be the computer centers in the network, and the weight of an edge is the distance or lease line cost between ends of the edge. An important problem in such networks is to obtain a spanning tree so that the sum of the weights of the edges in the tree is a minimum. Such a tree represents the best way to connect the centers together by a collection of edges which minimizes the ightcommunication costs. This notion leads us to the next denition: Denition 19.14: Given a weighted connected undirected graph, a minimum spanning tree is a spanning tree of the graph in which the sum of the weights of the trees edges is a minimum from among all spanning trees. For example, Figure 19.61a repeats the graph in Figure 19.2, which represents the distances in kilometers among the major cities in Western Canada. As we will see, the corresponding minimum spanning tree is the free tree of Figure 19.61b, which represents the least costly way to connect all the cities. It is easy to see that it does not necessarily have the least costly path between a pair of cities. For example, the least costly path from Calgary to Vancouver is the direct route, not the one through Edmonton and Prince George. There are several algorithms to determine the minimum spanning tree for a weighted graph. We will only give one of them, Kruskals algorithm, as the others are somewhat more complex. The basis of Kruskals algorithm is to observe that the objective is to select least costly edges, so as to connect the nodes of the graph. We do not want to select too many edges so as to form a cycle, since the most costly edge of a cycle can be removed and still have the nodes connected. If the edges are selected in the order of least cost to most cost, then no edge should be selected when it would form a cycle. Therefore, the general algorithm of Figure 19.62 is obtained. For the graph of Figure 19.61, Table 19.3 shows the processing of the edges in increasing order by their weight. The only nonstandard part of
926
Weight 150 160 258 292 526 578 597 612 638 765 806 843 1056 1245
Table 19.3. Consideration of Edges by Kruskals Algorithm Edge Action {Victoria, Vancouver} add the edge, as Victoria and Vancouver are not yet connected {Prince Albert, Saskatoon} add the edge, as Prince Albert and Saskatoon are not yet connected {Saskatoon, Regina} add the edge as, Saskatoon and Regina are not yet connected {Edmonton, Calgary} add the edge, as Edmonton and Calgary are not yet connected {Saskatoon, Edmonton} add the edge, as Saskatoon and Edmonton are not yet connected {Regina, Winnipeg} add the edge, as Regina and Winnipeg are not yet connected {Edmonton, Prince Albert} discard the edge, as it forms the cycle Edmonton, Saskatoon, Prince Albert, Edmonton {Calgary, Saskatoon} discard the edge, as it forms the cycle Saskatoon, Edmonton, Calgary, Saskatoon {Prince George, Edmonton} add the edge, as Prince George and Edmonton are not yet connected {Regina, Calgary} discard the edge, as it forms the cycle Regina, Saskatoon, Edmonton, Calgary, Saskatoon {Vancouver, Prince George} add the edge, as Vancouver and Prince George are not yet connected {Saskatoon, Winnipeg} discard the edge, as it forms the cycle Saskatoon, Regina, Winnipeg, Saskatoon {Vancouver, Calgary} discard the edge, as it forms the cycle Vancouver, Prince George, Edmonton, Calgary, Vancouver {Vancouver, Edmonton} discard the edge, as it forms the cycle Vancouver, Prince George, Edmonton, Vancouver
Graphs
927
638
Prince George
806
Edmonton
1245 292 612
597 526
Prince Albert
160
Saskatoon
765 258
843 578
Calgary Vancouver
150 1056
Winnipeg
Regina
Victoria (a) Prince George

806 638
Edmonton
526 292
Prince Albert
160
Saskatoon
258
Calgary Vancouver
150
Regina
578
Winnipeg
Victoria (b)
Figure 19.61. A connected undirected graph and its minimum spanning tree
Procedure minimumSpanningTree sort the edges into increasing order by their weight not all the nodes are connected select the next edge from the ordering the ends of the edge are already connected, i.e., adding the edge to the collection would form a cycle discard the edge
add the edge to the partial tree
Figure 19.62. General algorithm for Kruskals algorithm this algorithm is determining whether the ends of an edge are already connected. However, a simple DFS or BFS of the edges already in the partial tree will answer the question. An alternative to all these searches is to maintain a collection of sets, where each set contains a set of nodes already connected to each other a connected component. If the two ends of the edge are in the same set, the edge should not be added to the partial tree. Otherwise, the edge should be added to the partial tree, and the sets corresponding to the two ends should be unioned. There is a special set data structure useful for this task, but having a simple list for each connected component will do the job. The details of the implementation are left to the problems.
Problems 19.8.2
1. Implement Kruskals algorithm when a DFS is used to determine whether the ends of an edge
928
Graphs
Table 19.4. Selection of Edges for the Spanning Tree by Prims Algorithm Weight Edge Edge Selection Rule 292 {Edmonton, Calgary} least costly edge out of {Edmonton} 526 {Edmonton, Saskatoon} least costly edge out of {Edmonton, Calgary} 160 {Saskatoon, Prince Albert} least costly edge out of {Edmonton, Calgary, Saskatoon} 258 {Saskatoon, Regina} least costly edge out of {Edmonton, Calgary, Saskatoon, Prince Albert} 578 {Regina, Winnipeg} least costly edge out of {Edmonton, Calgary, Saskatoon, Prince Albert, Regina} 638 {Edmonton, Prince George} least costly edge out of {Edmonton, Calgary, Saskatoon, Prince Albert, Regina, Winnipeg} 806 {Prince George, Vancouver} least costly edge out of {Edmonton, Calgary, Saskatoon, Prince Albert, Regina, Winnipeg, Prince George} 150 {Vancouver, Victoria} least costly edge out of {Edmonton, Calgary, Saskatoon, Prince Albert, Regina, Winnipeg, Prince George, Vancouver}
are already connected. What is the worst-case time complexity of your algorithm? 2. Implement Kruskals algorithm when maintaining a set for each connected component is used to determine whether the ends of an edge are already connected. Give careful thought as to the best representation to use for the sets. What is the time complexity of your algorithm? 3. Another standard algorithm to determine a minimum spanning tree is Prims algorithm. The basis of this algorithm is to grow the tree from one node. Thus, at the start, one node (an arbitrary one) is in the tree and the others are not. In each iteration of the algorithm, the least costly edge is found that joins a node already in the tree with a node not already in the tree. Next, this edge and the node not already in the tree are added to the tree. The iterations continue until all nodes are in the tree. If such an approach is used on the graph in Figure 19.61a, with Edmonton as the start node, then the edges will be added to the growing tree in the order shown in Table 19.4. Figure 19.63 shows the situation after the third iteration. The gure uses solid lines for the vertices and edges that have already been added to the tree. The dashed nodes are not yet part of the tree, and the dashed edges are the ones that must be considered in the fourth iteration. Note that in this iteration, the edges {Saskatoon, Calgary} and {Edmonton, Prince Albert} are not considered because they are interior to the tree that has already been grown. Moreover, edges like {Victoria, Vancouver} and {Vancouver, Prince George} are not considered, since neither end of these edges is in the tree. An iteration only considers edges from a node in the tree to a node not yet in the tree. Implement the algorithm. Note: Try to nd a good way to determine the least costly edge connecting a node in the tree to a node not yet in the tree. What is the time complexity of your algorithm?
19.8.3
Topological Sorting
This section is concerned with directed graphs that do not contain any cycles. Such a graph is called a directed acyclic graph (DAG). Such graphs arise in a number of applications, including project scheduling of the next subsection. DAGs have several interesting properties. Of particular interest in this subsection, the nodes of a DAG can be placed in a list (an order) so that every edge out of a node is to a node further down the list. Expressed another way, every node with an edge into a node p appears before p in the list. Such an
929
638
Prince George
Edmonton
526 1245 292
Prince Albert
160
Saskatoon
765 258
843
Calgary Vancouver Victoria

1056
Winnipeg
Regina
Figure 19.63. State of the graph after three iterations of Prims algorithm
Figure 19.64. A simple direct acyclic graph (DAG) ordering of the nodes is called a topological order. Consider the DAG of Figure 19.64. It has several topological orderings. Two of them are 1, 2, 7, 6, 5, 4, 3 7, 1, 2, 6, 3, 5, 4 Figure 19.65 shows the graph with the nodes arranged in the rst topological ordering. One of the ways to see when a node can be added to a partial list for a topological ordering is to use the constraint that a node p must appear after all other nodes with edges directed into p. Thus, either 1 or 7 can appear rst in a topological ordering, as there are no edges directed into either one of them. If 7 is selected rst, 1 must be added second, as every other node has an edge directed into it from a node not yet in the list. After 1 occurs in the list, 2 can occur as the only edge into 2 is from 1. This can be continued until all nodes have been placed in the list. Using this approach, we nd that the key as to when a node p can be placed in the list is the number of nodes not in the list with an edge directed into p. When the number is 0 for a node p, p can be added to the list. Therefore, we dene the instance variable
p.predCtr = the number of edges directed into p from nodes not in the list
Note that this is the standard indegree at the start of the algorithm when there are no nodes in the list. With this instance variable dened, the general algorithm to obtain a topological list of the nodes becomes
930
Graphs
Figure 19.65. A drawing of the previous DAG with the nodes in topological order
find the indegree of each node p and assign the value to p.predCtr there is a node not in the list find a node p not in the list with a predCtr value of 0 add p to the end of the topological order list update the predCtr values of nodes adjacent from p
Note that of all the nodes not in the list, if none of them has predCtr = 0, then there is a cycle in the graph and a topological order does not exist. The implementation of this algorithm is quite straightforward. To nd the indegrees, the following can be done:
each node p each node p each node q adjacent from p

add 1 to the indegree value for q initialize indegree for p to 0
In this algorithm the graphs node iterator is used to loop through all nodes, and the graphs edge iterator is used to loop through all edges adjacent from p. Thus, the indegrees can be found in time O(n + m). Returning to the topological sort general algorithm, since there are n nodes, the main loop is done O(n) times. During each pass through the loop, it is necessary to nd a node not in the list with a predCtr value of 0. Hence, it can take O(n) time to nd such a node. The next step of the algorithm, the addition of a node to the end of a list, can be done in constant time provided that a reference is maintained to the end of the list. Finally, it is easy to update the predCtr values for nodes incident from p by iterating through the adjacency list of p. Therefore, an upper bound for the time requirements of the topological sort algorithm is O(n2 ). If a list is maintained of nodes with predCtr = 0 and also not in the partial topological order, a somewhat better algorithm can be obtained. The algorithm is
each node p each node p each node q adjacent from p

add 1 to the indegree value for q obtain a list, call it ready, of nodes not in the topological order list that have predCtr = 0 the list ready is not empty remove a node p from the list ready add p to the end of the topological order list each node q adjacent from p subtract 1 from the predCtr value for q q.predCtr = 0 add q to the list ready initialize predCtr for p to 0
For this algorithm, all steps of the while loop take a constant amount of time except for the loop through the nodes adjacent from p. The time for this loop is the outdegree of p. Summed over all nodes, it becomes the number of edges in the graph [i.e., O(m)]. Therefore,
931
v2
v5
v1 v4
v3
v6
Figure 19.66. A digraph whose nodes are to be put in topological order the total time for the algorithm is O(n + m). Another approach to performing a topological sort is to modify the DFS approach introduced in Section 19.7.2. The key to this algorithm is that a node must appear in the list before all nodes adjacent from it. Also, when a DFS of an acyclic graph has been completed at a particular node, all nodes reachable from that node have already had the DFS of them completed. Therefore, when a DFS is completed at a node, the node can be placed in front of the nodes already in the topological order list. Hence, we obtain the following algorithm skeleton to obtain a list containing the nodes of a DAG in topological order:
Procedure topologicalOrder(G) each node p mark p not reached initialize the topological order list to being empty each node p p isn t marked reached scan(p) Procedure scan (p) mark p reached each q adjacent from p q is not marked reached dfs(q) add p to the front of the topological order list
Note that this algorithm, as presented, does not detect cycles. However, it is easy to augment the algorithm to detect a cycle should one exist.
Problems 19.8.3
1. Obtain a topological order for the nodes of the digraph in Figure 19.66. 2. Obtain a topological order for the nodes of the digraph with the following adjacency list representation: 1 2 3 4 5 6 7 8 5 1, 4 6, 8 5 2, 7 1 7
932
Graphs
Figure 19.67. A DAG to represent an expression

3. Using the digraph of Problem 2 as input, give a trace of the modied DFS to obtain a topologically ordered list of the nodes. 4. In compiler construction, an expression such as (x y )(w + z ) (x y )(w z ) can be represented by the directed acyclic graph of Figure 19.67. Obtain a topological order for symbols of this expression. 5. Implement and test the algorithm to obtain a topological order of the nodes of a DAG, using the approach that employed predCtr. 6. Implement and test the algorithm to obtain a topological order of the nodes of a DAG, using the DFS approach.
19.8.4
Scheduling Networks
Recall that Example 19.6 dealt with scheduling a project. To do such scheduling, a project is modeled by a directed graph in which each node represents an event (a point in time), and each directed edge represents an activity or task to be performed. The weight or numeric value associated with an activity or edge is the time required to perform that activity, called its duration. The events, points in time, are used to coordinate the activities. Thus, if activities a and b must be completed before activities c and d can start, the network is designed so that the edges for a and b are directed into a node from which the edges for c and d originate. This means that a path in the graph is a sequence of activities that must be completed in that order. As a result, the graph can never have a cycle, since the rst activity of a cycle cannot be done rst and also be done after the last activity of the cycle. The resulting directed graph for a project will have one starting event called the source and one terminating event called the sink. A digraph with these characteristics is called a scheduling network. One of the primary goals of modeling a project by a scheduling network is to determine the minimum time to complete the project. Another objective is to determine those activities, called critical activities, that must be started and completed on time in order to complete the project in the minimum time. The critical activities are important, as they must be carefully monitored to ensure that they meet their deadlines, or else the completion of the whole project could be delayed. In this section, we formalize and extend the notions introduced in Example 19.6 on page 845, and we present algorithms to determine the minimum completion time, critical activities, and other scheduling information.
933
2 a:2 1 f:3 4
b:1
3 c:11
e:10 d:2 5
6 h:4
g:9
Figure 19.68. A simple scheduling network Two basic approaches to scheduling, planning, and controlling projects appeared in the late 1950s: Program Evaluation and Review Technique (PERT) and Critical Path Method (CPM). Although, initially, there were major dierences between the two methods, subsequent development of these methods has blurred these dierences. Consequently, we do not distinguish between PERT and CPM in the next discussion. The construction of a scheduling network is constrained by the following rules: The network must have one source and one sink. The network cannot have any cycles. Each activity must be represented by a single directed edge (i.e., the activity is unique) and have a nonnegative duration. At most, one activity can originate at one node and terminate at a specic other node (i.e., the graph must be simple). The fourth rule may require the use of dummy activities. A dummy activity does not require any time to complete, but it is required, because the graph must be simple and parallel sequencing relationships need to be modeled. As an example, consider the dummy activities d1 and d2 in Figure 19.9 on page 850. Since concurrent activities a2 and a3 must be completed before activity a4 can proceed, one would normally expect a2 and a3 to be directed into the start node of the edge for activity a4 . But this would cause two parallel edges, since a2 and a3 have the same initial node. To avoid this, two events (nodes 3 and 4) and two dummy activities (d1 and d2 ) are added. There are further examples of dummy activities later in the network of Figure 19.9 on page 850. One of the main objectives of project scheduling is to determine the earliest completion time for the project. We now show how this value can be computed from attributes for the individual events and activities. Note that an event can occur when all activities leading into it have been completed. Using this concept, we nd that the earliest completion time for the project is the earliest time that the last event can occur. Thus, it is useful to determine the earliest occurrence time for each event. Denition 19.15: The earliest occurrence time for an event j , eot(j ), is the earliest time that all incoming activities to that event can be completed. Therefore, the earliest completion time of the project is eot(t), where t is the sink of the network. We will soon see that the eot values are also useful for other nodes of the network. We now consider the calculation of the eot values. We begin with the simple network in Figure 19.68. Each node in the network has an integer label, and each edge is labeled by
934
Graphs
a letter and its duration. Vertex 1 has no incoming edges, so it is the source and there is nothing to delay its start. Thus, eot(1) is set to 0. For vertex 2, it must wait for activities a and d to complete. Given eot(1) = 0 and the duration of a is 2, a can be completed by time 2. The completion of d is not quite so simple, as d cannot be started until f has completed. f requires 3 time units, so eot(4) = 3, and the earliest completion time of d is given by eot(4) + duration(d). Since a cannot be completed until time 2 and d cannot be completed until time 5, we obtain eot(2) = 5. Now consider eot(5). Both activities e and g must be completed by the eot(5). Activity e cannot start until activities incoming into node 2 have completed and e has a duration of 10; consequently es earliest completion time is eot(2) + 10 = 15. Similarly, g cannot nish until time 12, so eot(5) = 15. Continuing with the other nodes, eot(3) = 6 and eot(6) = 19. By analyzing these calculations, it can be seen that the eot of a node is given by the length of the longest path to it from the source. In a general graph, computation of the longest path in a graph is very dicult. However, when the graph is directed and acyclic (has no cycles), the computation of the longest path can be done eciently. Rather than directly computing the longest paths, we will use a slightly dierent approach and compute the eot values from the earliest completion times for the activities. Denition 19.16: The earliest completion time for an activity x, ect(x), is the earliest time that the activity can be completed. It is easy to see that
each activity x = (i, j)
ect(i, j) eot(i) + duration(i, j)
Note that sometimes an activity is denoted by its label and sometimes by its pair of nodes. Thus, when x is the activity from node i to node j , ect(x) is the same as ect(i, j ). The eot values can be computed by the algorithm
the source s nonsource event j

eot(j) = = = earliest time that all preceding activities can be completed maximum over all activities (i, j) into j, ect(i, j) maxactivities (i, j) ect(i, j) maxactivities (i, j) (eot(i) + duration(i, j)) eot(s) 0
Note that the maximization is over all activities (i, j ) into node j . From this specication, we see that the eot value for a node j can be calculated from the eot values of other nodes, a recursive specication. Also, note that each of these other nodes has an edge directed into j . Recall from the previous subsection that in a topological ordering of the nodes, a node appears after all nodes with edges directed into it. Hence, if the nodes are handled in a topological order, all the eot values needed to calculate eot(j ) will be known when j is handled. This approach involves calculating eot(j ) from edges into j when the events are reached in the topological order. However, if we use the adjacency lists graph representation, it is easier to process edges out of, rather than into, a node. This is easily accommodated by small changes to the algorithm. The eot value is known for the source, so its value can be used to set a lower bound on the earliest occurrence time of succeeding events. In general, when a node j is reached in topological order, eot(j ) will be known, and it can be used to update the ect values of exiting activities and the eot values of events reached by these edges. When j is reached, eot(j ) will have its correct value as all activities into j will have been
935
processed when at the initial event for the activity. Given that a topological ordering has been obtained, as per the preceding subsection, the result is a simple algorithm to compute the activity ect values and the event eot values:
each event j each event i in topological order each j adjacent from i

eot(j) 0
ect(i, j) eot(i) + duration(i, j) eot(j) max(eot(j), ect(i, j))
The following table shows the eot time for each node, where the nodes are listed in the order in which they were handled (topological order):
node 1 4 2 5 3 6 eot 0 3 5 15 6 19
As previously stated, the eot value for the sink is the earliest completion time for the overall project. Another useful objective is to compute how much an activity can be delayed without delaying the whole project. This value is called the slack of the activity. Activities with no slack are the critical activities that must be both started as soon as possible and also completed in the minimum time in order to not delay the completion of the whole project. For an activity, its slack is given by the dierence between its earliest completion time and the latest time it can complete without delaying the project. The earliest completion times have already been calculated; thus, it only remains to determine how late the activity can complete without delaying the project. It is more convenient to compute the latest occurrence times for events, instead of completion times for activities, so we will use that approach. Denition 19.17: The latest occurrence time for an event j , lot(j ), is the latest time that all incoming activities to that event can be completed without delaying the entire project. The latest occurrence time for the sink is the earliest occurrence time of the whole project. For nonsink nodes, their latest occurrence times can be computed from the latest occurrence times of nodes incident from them. If the nodes are handled in reverse topological order, then the latest occurrence times for nodes incident from them will have already been calculated. Thus, we obtain the following:
the sink t
each nonsink event i in reverse topological order
lot(t) eot(t)
lot(i) earliest time that any of the succeeding activities must be started = minimum over all activities (i, j) from i, lot(j) - duration(i, j) = minactivities (i, j) (lot(j) - duration(i, j)) each activity (i,j) from i slack(i, j) lot(j) - ect(i, j)
Note that the slack of an activity is easily computed from the latest occurrence time of the destination node and the earliest completion time of the activity. The implementation of this algorithm is straightforward. The following table shows the lot time for each node, where the nodes are listed in the order in which they were handled (reverse topological order):
936
Graphs
node 6 3 5 2 4 1 lot 19 8 15 5 3 0
The slack times, in the order calculated by the reverse topological order, are as follows:
activity c h b e d g a f slack 2 0 2 0 0 3 3 0
Because of the way the slack is dened, there is at least one path with all activities on the path being critical (i.e., having no slack). This critical path is a longest path from the source to the sink, where the length of path is the sum of the weights on the edges. Once the slack values have been obtained, such a path is easily found by using either a DFS or a BFS that follows edges with no slack. We now consider the time complexity of determining the earliest occurrence times, earliest completion times, latest occurrence times, and the slack values. The previous subsection showed that a topological ordering of the nodes of an acyclic graph can be determined in time O(n + m), where n is the number of nodes and m is the number of edges. To compute the eot for an event and the ect for an activity, one pass is needed through the nodes of the graph in topological order. When at a node, the edges from the node need to be analyzed. If the adjacency lists representation is used, this can be done in time O(n + m). Another scan of the nodes in reverse topological order is needed to compute the lot value for the events and the slack for the activities. If a doubly linked list is used to store the nodes in topological order, a reverse iterator can be used to scan the nodes in the correct order and the edges can be analyzed to determine the lot and slack values. Again, with an adjacency lists representation, this takes time O(n + m). Therefore, the total time is O(n + m). If the graph is stored in an adjacency matrix, it takes O(n) time to determine the edges incident to a node so the whole algorithm takes time O(n2 ).
Problems 19.8.4
1. Make each of the following changes to the software construction example explored in Example 19.6: (a) Suppose that subsystem 1 makes much more extensive use of the users interfaces than the other subsystems. As a result, these two parts are integrated and tested before combining the whole system. Assume the integration of subsystem 1 and that the users interface takes three days and that its testing time takes four days. Furthermore, assume that the times for the integration and testing of the whole system are reduced to ve and three, respectively. (b) The decision has been made to use a simpler system for the users interfaces. As a result, its design only takes four days, its implementation 12 days, and testing four days. (c) As a result of the analysis of quality for the system in activity a38 , it is determined that major restructuring needs to be done. Thus, the time for activity a39 is increased to 20 days. Show how these changes aect the scheduling network in Figure 19.9, and note any changes to the critical path. Solve (a), (b), and (c) independently. 2. Formulate a set of activities and precedence relations for getting to the university in the morning. Activities such as the following would be reasonable candidates: Get up
937
Table 19.5. Precedence Relationships Activity a b c d e Preceding Activities d Duration 15 8 6 7 12
and Durations for a Scheduling Network f g h i j k l m e a, f g b, h b, h i g c, k 8 10 9 9 8 11 8 9
n k, l 12
Table 19.6. Precedence Relationships Activity a Preceding Activities Duration 8
and b a 4
Durations c d b a 14 10
for a Second Scheduling Network e f g h i a c, d d d, e f, g, h 9 18 25 27 6
Wake up Make breakfast Eat breakfast Take a shower Choose clothes Dress Gather books and due assignments Get to university Represent the activities and precedence relations by a scheduling network. 3. Given the activities, durations, and the preceding activities in Table 19.5, obtain (a) A scheduling network implied by the relationships; (b) The eot and lot for each event; (c) The slack for each activity; (d) A critical path for the scheduling network. As further explanation, since a and f are listed as preceding activities for activity g , that means that both a and f must be completed before g can be started. 4. Repeat the previous problem for Table 19.6. 5. The latest completion time can also be dened and computed for each activity. Give an appropriate denition and algorithm to compute this value for each activity. 6. Assuming that the slack values are known, give an algorithm to determine a critical path in a scheduling network. 7. Suppose that the slack values are known and the duration of an activity is changed. In what cases will it increase the earliest completion time for the overall project? If it will increase the overall earliest completion time, how much will it increase the time for the project? Which other activities might have their slack value changed as a result of the duration change? What is involved in determining the new slack values?
938
1 s1, s2 2 while (!finished) T s3 3 flag1 == 0 F T T 4 s4, s5 s6 6
Graphs
void whatever() { s1 s2 while (!finished) { s3 if (flag1 == 0) { s4 s5 } else { if (flag2 == 0) s6 else s7 s8 } } s9 }
5 flag2 == 0 F 7 s7
s8
end of while statement

10 s9
end of method (b)
(a)
Figure 19.69. Modeling a module with a control-ow graph
19.8.5
Graphs in Testing
Two broad approaches to identifying test cases were introduced in Chapter 16: white-box, or program-based, testing, and black-box, or specication-based, testing. This subsection explores the use of graphs in testing software. The representation of code by a controlow graph was briey discussed in Section 16.4. The control-ow graph approach is now extended to basis path testing. Also, the notions of generating test cases based on data ow (i.e., where variables are dened and subsequently used), are introduced. Basis Path Testing Graphs of programs have been used for many years in several areas of computer science, including software testing and compiler optimization. In these areas of application, any program or program fragment written in some procedural (imperative) language can be translated into a ow graph. A program module (procedure or function) consists of a sequence of language constructs from a base set, such as that given in Figure 19.7 on page 846. As an example, the control ow graph for the Java method skeleton in Figure 19.69a appears in Figure 19.69b. Nodes 1 and 10 denote the start node (source) and nish node (sink) of the procedure, respectively. A node in a control-ow graph can represent a block of code. A basic block is a program fragment that has only one entry point and whose transfer mechanism between statements of the block is that of proceeding to the next statement. Thus, it has no branch statements, except perhaps the last statement that has branches to other blocks. An alternate denition
939
T c1 c2 F T s F c2 F T c2 T F T s
c1 F
a) c1 && c2
b) c1 & c2
Figure 19.70. Control-ow graph representations of some compound predicates is that a basic block is a sequence of program statements organized in such a way that all transfers into the block are to the rst statement in the block. Also, if any statement in the block is executed, all statements in the block are executed. Assuming that the statements s1 through s9 in Figure 19.69a are noncontrol statements, such as assignment statements, each node in Figure 19.69b represents a basic block. In the ow graph of Figure 19.8 on page 847 not all nodes represented basic blocks. It should be noted that a node in a control ow graph can represent a block of code which ends with the branch part of a statement instead of a complete statement. For example, node 2 represents a simple condition or predicate, whereas node 3 represents the block that ends with the predicate, flag1 == 0. We will usually assume that each condition is a simple condition or predicate that is, a condition that does not contain logical operators such as and and or. However, it is not dicult to handle compound conditions (predicates), since each compound predicate can be decomposed into two simpler predicates. For example, the Java statements
(c1 && c2) s(); (c1 & c2) s();
can be decomposed into simple predicates to yield the control-ow graphs shown in Figure 19.70. In the mid-1970s, Thomas McCabe [33] proposed a white-box testing technique called basis path testing. This technique uses the control-ow graph of a module to generate a set of independent paths that must be executed to assume that all statements and branches have been executed at least once. McCabe denes the cyclomatic complexity of a methods control ow graph as v (G) = m n + 2 where m is the number of edges and n the number of nodes. For example, consider again the procedure and ow graph in Figure 19.69. The cyclomatic complexity of this ow graph is v (G) = 12 10 + 2 = 4. The cyclomatic complexity of a ow graph is related to the branch coverage criterion of white-box testing. A collection of paths is called independent if each path has a node or edge not in any of the other paths. It should be noted that a ow graph may have many sets of independent paths. The cyclomatic complexity of a ow graph counts the maximum
940
Graphs
size for an independent collection of paths. Such a maximum-sized collection of independent paths must cover all nodes and branches. Furthermore, every path not in the collection can be expressed as a combination of the independent paths. The reason for a maximum-sized collection is so that each member of the collection tests as few new things as possible. This facilitates nding a fault when a failure occurs. For the ow graph of Figure 19.69, four independent paths from the entry node to the exit node are
P 1: P 2: P 3: P 4: 1-2-10 1-2-3-4-9-2-10 1-2-3-5-6-8-9-2-10 1-2-3-5-7-8-9-2-10
An alternate denition of the cyclomatic complexity of a control ow graph is v (G) = p + 1 where p is the number of simple predicate nodes in the ow graph. The current example ow graph contains three simple predicates; therefore, v (G) = 3 + 1 = 4. This denition makes it very easy to compute the cyclomatic complexity of a method directly from its code. McCabes basis path technique has some potential drawbacks. First, the testing of a set of independent paths may not be sucient in many cases. Furthermore, moving from code to a directed graph representation has obscured important information that is present in the code. Specically, the distinction between feasible and infeasible paths is obscured. Recall from Section 16.4 that an infeasible path in a program is one which cannot be executed, occuring when we cannot assign values to the programs input variables such that the path will be followed. In some control-ow graphs, several of the independent paths can be infeasible. Such situations may be alleviated by examining the ow of data in a program. This approach is briey explored in the next subsection. Another drawback to McCabes approach is the potential problem that testers may have in interpreting what is meant by representing some path as a linear combination of independent paths. Note that McCabes cyclomatic complexity measure has sometimes been used by software developers to assess the overall complexity of a program module. This use of cyclomatic complexity is based on the belief that a module with more predicates than another module is more complicated than the other module. However, such a belief is based solely on control ow. Although the complexity of a module is partially based on the number of predicates it contains, module complexity also depends on other factors, such as data structure manipulations. Data-Flow Testing In the previous subsection, we explored the use of control ow as a white-box testing technique. We now turn our attention to using the notion of control ow in a data-ow analysis. The technique used is similar to that used by compilers to optimize code. Most data-ow techniques select test cases based on where variables are dened and where they are used in a program. The execution of such test cases is to ensure the testing of variable denitions and their subsequent uses in the program. The notions of data ow presented here are based on the approach taken by Rapps and Weyuker [43]. A program path that links the denition and use of a variable is called a du-path. Variables can be used in computations (c-uses) and in predicates or decisions (p-uses). We now elaborate on each of the three types of variable occurrence:
941
denition: a variable is dened by its occurrence in the left-hand side of an assignment statement, in a read statement, or in a methods signature as a parameter. c-use: a variable is used for computational purposes when it occurs in the right-hand side of an assignment statement, or in a write statement. p-use: A variable is used for predicate or decision purposes if it occurs within the predicate of a conditional (e.g., , , or ) statement. Observe that a c-use variable may also aect the ow of control in a program, although it does so indirectly by aecting the p-use of the variable in predicates. Similarly, because a p-use of a variable aects the ow of control through a program, it may also indirectly aect the computations performed. Data-ow analysis can be performed on a program by representing it as a type of ow graph called a denition-use graph. A program is rst partitioned into a set of disjoint basic blocks. Recall from the previous subsection that a basic block is a sequence of program statements organized in such a way that all transfers to the block are to the rst statement in the block. Also, if any statement in the block is executed, then all statements in the block are executed. We now illustrate some of the data-ow notions with an example. Figure 19.71 contains a binary search function with a signature that contains two parameters: a vector of integer values (keys) and the value being sought (x). The function returns the outcome of the search via a Boolean value, the value true for a successful search, and the value false for an unsuccessful search. The function is partitioned into an entry block (s), six interior blocks (numbered 1 to 6), and an exit block (e). Figure 19.71 also displays the three types of variable occurrences in the function: denitions, c-uses, and p-uses. The functions entry block denes keys and x directly. This block also denes indirectly keys.length (the number of items in the array). In block 1, the variables low and high are dened. Furthermore, this block has the c-use of the variable keys.length. In block 2, the predicate makes p-uses of variables low and high. Block 3 c-uses low and high, denes middle, and p-uses x, keys, and middle. The variable denitions and uses in the remaining blocks are easily obtained. The denition-use graph for the binary search method appears in Figure 19.72. Each block of code is a node in the graph. As was the case in control-ow graphs, an outgoing directed edge from a block indicates the blocks executional successor. If present, a conditional statement is always the last instruction of a block and has two executional successors. Nodes 2 and 3 in the ow graph represent predicate nodes. This time, rather than placing the condition of a predicate block beside the node for the block, the predicate that is true is placed beside the directed edge emanating from the node. A du-path is the path from a node with the denition of a variable to a node with the use of the variable. In the example program, there are 13 du-paths. The du-path (s, 1) links the denition of keys.length to its subsequent c-use in node 1. The du-path (s, 1, 2, 3) links the denition of x and keys to their subsequent p-use in node 3. The du-path (s, 1, 2, 6) links the denition of the parameters in the functions signature to their subsequent use in node 6. The du-path (1, 2) links the denition of low and high to their subsequent p-use in node 2. The du-path (1, 2, 3) links the denition of low and high to their subsequent c-use in node 3. Similarly, the du-path (1, 2, 6) links the denition of low to its subsequent use in node 6. Since we are only concerned with the ow of data between nodes, we consider only a variables global c-use; that is, a variables use outside the block in which it was dened. For example, since middle is both dened and p-used in block 3, we ignore this du-path.
942
boolean binarySearch(int[] keys, int x) { /* low and high are the ends of the current interval, middle is the index of the middle of the current interval */ int low, high, middle; low = 0; high = keys.length - 1; while(low < high) { middle = (low + high) / 2; if(x > keys[middle]) low = middle + 1; else high = middle; } return keys[low] == x; }
definitions keys, x
c-uses
p-uses
1 2 3 4 5
low high
keys.length low, high
middle low high
low, high middle middle keys, low, x
x, keys, middle
Figure 19.71. An example illustrating du-paths
Graphs
943
s 1 2
du-paths (s, 1) (s, 1, 2, 3) (s, 1, 2, 6) (1, 2) (1, 2, 3) (1, 2, 6) (4, 2) (4, 2, 3) (4, 2, 6) (5, 2) (5, 2, 3) (3, 4) (3, 5)
low = 0; high = keys.length - 1;
low < high 3 middle = (low + high) / 2; x > keys[middle] low = middle + 1; 4 x <= keys[middle]
5 high = middle;
low >= high return keys[low] == x; 6

e
Figure 19.72. Denition-use graph for a binary search method
( i = 0; i < n; i++) ( j = i+1; j < n; j++)

(a[i] < a[j]) swap(a[i], a[j])
Figure 19.73. Program fragment for determination of basis paths The du-paths (4, 2), (4, 2, 3), and (4, 2, 6) link the denition of low to its subsequent uses: p-use in node 2, c-use in node 3, and c-use in node 6. Similarly, the du-paths (5, 2) and (5, 2, 3) link the denition of high to its subsequent uses: p-use in node 2 and c-use in node 3. Finally, the du-paths (3, 4) and (3, 5) link the denition of middle to its subsequent c-use in nodes 4 and 5. These du-paths are summarized in Figure 19.72. The primary objective is to formulate a minimal number of paths required to cover all du-paths. For our example, the following three paths cover all du-paths: (s, 1, 2, 6, e), (s, 1, 2, 3, 5, 2, 3, 5, 2, 6, e), and (s, 1, 2, 3, 4, 2, 3, 4, 2, 6, e). The last step is to generate a data set for each of these three paths and execute the program to ensure that it obtains the correct result.
Problems 19.8.5
1. Consider the program fragment of Figure 19.73, and (a) Obtain a control ow graph for this fragment to determine linear independent basis paths. (b) Determine the cyclomatic complexity for the graph in part (a). (c) Generate a set of basis paths for the graph in part (a). 2. For the partition method on page 808,
944
Graphs
BufferedReader consoleInput = BufferedReader( InputStreamReader(System.in)); x = Integer.parseDouble(consoleInput.readLine()); y = Integer.parseInt(consoleInput.readLine()); pow; (y < 0) pow = -y;
{ }
z = 1.0; (pow != 0)
z = z * x; pow = pow - 1; (y < 0) z = 1.0 / z; System.out.print(z);
pow = y;
Figure 19.74. Program fragment for determination of du-paths
(a) Obtain a control ow graph for this method for the purpose of determining linear independent basis paths. (b) Determine the cyclomatic complexity for the graph in part (a). (c) Generate a set of basis paths for the graph in part (a). 3. Obtain all the du-paths and the minimal test paths that will cover them for the program fragment of Figure 19.74. 4. Suppose three numbers a, b, and c are given in descending order, representing the lengths of the sides of a triangle. The problem is to determine the type of the triangle (whether it is isosceles, equilateral, right angled, obtuse, or acute). For the program of Figure 19.75, obtain all the du-paths and the minimal test paths that will cover them.
19.9
Concluding Remarks
This chapter introduced the graph data structure. A graph provides a good model of a relationship between pairs of items: Each item is associated with a vertex, and if two items are related, then an edge is used to join them. By analyzing the graph, many properties of the relationship can be obtained. Among the basic properties, connectivity is an important one. Often weights are associated with the relationship; for example, distances, times, or costs. This leads to problems like shortest path, longest path, and minimum spanning tree. In particular, the longest path in the scheduling network for a software project determines the earliest completion time for the project. There are two standard ways to store a graph in memory. The adjacency matrix representation is more natural and provides fast access to the edge connecting two vertices. However, using it makes it somewhat slower to access all edges incident to a given vertex. However, the adjacency lists representation provides fast access to all incident edges, but slower access to the particular edge connecting two specied vertices. Three general approaches are used to process a graph. Matrix manipulation techniques are often useful to process the adjacency matrix of a graph. For the other two approaches, BFS and DFS, the adjacency lists representation of graphs is usually used. A BFS processes vertices close to the source rst, whereas a DFS is better for determining specic information
Sec. 19.9. Concluding Remarks
945
BufferedReader consoleInput = BufferedReader( InputStreamReader(System.in)); a = Integer.parseInt(consoleInput.readLine()); b = Integer.parseInt(consoleInput.readLine()); c = Integer.parseInt(consoleInput.readLine()); ((a < b) | (b < c)) System.out.print("Illegal inputs");
((a == b) | (b == c)) ((a == b) & (b == c)) System.out.print("Equilateral triangle.");
System.out.print("Isosceles triangle.");
h = a * a; d = b * b + c * c;
(h == d) System.out.print("Right triangle."); (h < d) System.out.print("Acute triangle.");
System.out.print("Obtuse triangle.");
Figure 19.75. Program to categorize triangles about an individual path as it traces paths one by one. In general, the DFS is more often used than the other two. We have barely touched on the multitude of graph applications, properties, and algorithms. The interested reader should consult a reference such as [9].

Tremblay Chap 19

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tremblay Chap 19

Uploaded by

Copyright:

Available Formats

Data Structures and Software Development in an Object-Oriented Domain Java Edition Jean-Paul Tremblay and Grant A.

Cheston University of Saskatchewan

Introduction and Examples of Graph Modeling

Sec. 19.1. Introduction and Examples of Graph Modeling

Figure 19.3. A graphic representation of a city street system

Sec. 19.1. Introduction and Examples of Graph Modeling

Figure 19.4. Graph representation of a computer network

(a) Ring arrangement

(b) Star arrangement

Figure 19.5. Specialized node arrangements in local area networks

Figure 19.6. A graph representation of a software system

Figure 19.7. Flow graph notation for various constructs

Sec. 19.1. Introduction and Examples of Graph Modeling

void whatever() { s1 s2 while (!finished) { s3 if (flag1 == 0) { s4 s5 } else { if (flag2 == 0) s6 else s7 s8 } } s9 } (a)

5 flag1 == 0 F T T 6 s4, s5 s6 8 7 flag2 == 0 F 9 s7

end of loop statement

Figure 19.8. Modeling a module with a ow graph

Table 19.1. Software Project Activities

Sec. 19.2. Basic Denitions of Graph Theory

Basic Denitions of Graph Theory

d3 a9 a10 a11 a13 a14

Figure 19.9. A scheduling network for a moderate-sized software project

Sec. 19.2. Basic Denitions of Graph Theory

(a) Directed graph with edge labels

(b) Undirected graph with edge labels

(c) Directed graph without edge labels

v3 (b) Directed multigraph

3 (c) Directed graph with a loop

(a) Undirected multigraph with loops

Sec. 19.2. Basic Denitions of Graph Theory

Figure 19.13. Examples of complete graphs

Sec. 19.3. Graph ADT

4 u1 2 3 u2 (a') u5 v4 v5 v1 (b) v2 u1 (b') u2 v3 u4 u3 u4 u3

Figure 19.15. Pairs of isomorphic graphs

Figure 19.16. Isomorphic graphs

Figure 19.17. Nonisomorphic graphs

Sec. 19.3. Graph ADT

EdgeUos eItem() // the current edge

Sec. 19.3. Graph ADT

4 (b) 4 1: 2: 3: 4: directed 2340 0 10 0 (c)

Sec. 19.4. Paths, Reachability, and Connectedness

Paths, Reachability, and Connectedness

end of loop statement

Sec. 19.4. Paths, Reachability, and Connectedness

Sec. 19.5. Graph Representations

Figure 19.26. A digraph for path analysis

Sec. 19.5. Graph Representations

(VertexUos) .clone(); } (CloneNotSupportedException e) { /* Should not occur: implements Cloneable. */ e.printStackTrace(); ; }

Figure 19.28. The VertexUos class

dslib.graph; dslib.base.PairUos; dslib.exception.ItemNotFoundUosException;

Figure 19.29. The EdgeUos class (part 2)

Sec. 19.5. Graph Representations

dslib.graph; dslib.base.*; dslib.exception.*; java.io.*; java.util.Random; java.util.Scanner;

Figure 19.30. Part of the GraphUos class (part 1)

Sec. 19.5. Graph Representations

InvalidArgumentUosException("Invalid argument--graphChoice "

("dslib.graph.VertexUos", "dslib.graph.EdgeUos", fileName);

Figure 19.30. Part of the GraphUos class (part 2)

Figure 19.30. Part of the GraphUos class (part 3)

Sec. 19.5. Graph Representations

InvalidArgumentUosException("Invalid argument--graphChoice "

+ "must be directed or undirected only"); createEdgeDataStructure(); n = 0; ( i = 1; i <= numVertices; i++) vNewIth(i);

Figure 19.30. Part of the GraphUos class (part 4)

(V) Class.forName(vertexType).getDeclaredConstructors()[0].newInstance(id); InvalidArgumentUosException("Invalid argument--vertex type "

dslib.graph; dslib.base.; dslib.exception.; java.io.*; java.util.Random; java.util.Scanner;

dslib.graph; dslib.list.; dslib.exception.; dslib.base.CursorPositionUos;