You are on page 1of 15

Graph Data Structure

Recall the ADT binary tree: a tree structure used mainly to represent 1 to 2 relations,
i.e. each item has at most two immediate successors.
Limitations of tree structures:
An item in a tree can appear in one position only
An item has exactly one predecessor (parent node) unless it is the root (none) the
only available links are parent/child links
E.g. : Attempting to represent a transport link data with a tree structure:

Graph
A graph is a pair (V, E), where V is a set of nodes, called vertices and E is a collection
(can be duplicated) of pairs of vertices, called edges.
Vertices and edges are data structures and store elements
Example:
A vertex represents an airport and stores the three-letter airport code
An edge represents a flight route between two airports and stores the mileage of the route


Applications
Electronic circuits
o Printed circuit board
o Integrated circuit
Transportation networks
o Highway network
o Flight network
Computer networks
Local area network
Internet
Web
Databases
o Entity-relationship diagram







Graph Terminology

A graph is a pair (V, E), where V is a set of nodes, called vertices and E is a collection (can be
duplicated) of pairs of vertices, called edges.
Edges, also called arcs, are represented by (u, v) and are either:

Directed if the pairs are ordered (u, v)
u the origin
v the destination

Undirected if the pairs are unordered

Then a graph can be:

Directed graph (di-graph) if all the edges are directed
Undirected graph (graph) if all the edges are undirected
Mixed graph if edges are both directed or undirected







Illustrate terms on graphs
End-vertices of an edge are the endpoints of the edge.
Two vertices are adjacent if they are endpoints of the same edge.

Undirected graph - (u, v) means v is adjacent to u, and vice versa.
Example: (2, 3) and (3, 2) are adjacent.

Directed graph - (u, v) means v is adjacent to u, but unless there is an edge (v, u), then u
is not adjacent to v.
Note that this is the opposite of definitions used by most other authors but the
distinction can generally be ignored in the algorithms presented in the text.
Example: (3, 2) means 2 is adjacent to 3 but 3 is not adjacent to 2.
adjacent vertices are said to be neighbors

An edge is incident on a vertex if the vertex is an endpoint of the edge.
Outgoing edges of a vertex are directed edges that the vertex is the origin.
Incoming edges of a vertex are directed edges that the vertex is the destination.
Degree - the number of edges incident on a vertex
Undirected graph - the degree equals the number of edges incident on a vertex.
Example: vertex 2 has degree 4.

Directed graph:
In-degree - equals the number of edges who have their head at a
vertex, or targeting the vertex
Example: vertex 2 has in-degree 2.

Out-degree - equals the number of edges emanating from the vertex.
Example: vertex 2 has out-degree 1.
Degree of a vertex, v, is denoted deg(v),
Out-degree, is denoted by outdeg(v),
In-degree, is denoted by indeg(v).

Parallel edges or multiple edges are edges of the same type and end-vertices




Path is a sequence of alternating vertices and edges such that each successive vertex is
connected by the edge. Frequently only the vertices are listed especially if there are no parallel
edges.
Path Of length k from u to u' in G = (V, E) is a sequence of vertices < v0, v1, ..., vk > where u =
v0 and u' = vk, and (vi -1, vi) E, for i = 1, 2, ..., k
|P| = Path Length - equals the number of edges in the path
There is always a 0-length path from u to u
Reachable - given u and u', u' is reachable from u if there exists a path P from u to
u'
Simple path is a path with distinct vertices.
Directed path is a path of only directed edges
Directed cycle is a cycle of only directed edges.
Sub-graph is a subset of vertices and edges.
All of these graphs are sub-graphs of the first graph.


Cycle is a path that starts and ends at the same vertex.
In Directed Graphs:
a cycle path (circuit) is a path < v
0
, v
1
, ..., v
k
> such that v
0
= v
k
and |P| 1
a simple cycle is when v
0
, v
1
, ..., v
k
in P are distinct, does not contain the same
edge more than once.
a self loop is a cycle of length 1, and a directed graph with no self loops is
referred to as simple
Directed Acyclic Graphs (DAG) - a directed graph with no cycles (circuits)
In Undirected Graphs:
a cycle path (circuit) is a path < v
0
, v
1
, ..., v
k
> such that v
0
= v
k
and |P| 1
a simple cycle is when v
0
, v
1
, ..., v
k
in P are distinct and |P| 3
(1,2,1) is not a simple cycle.
Undirected Acyclic Graph - an undirected graph with no simple cycles
(circuits)







Simple graphs have no parallel edges or self-loops





Connected graph v/s Strongly connected graph v/s Complete graph
Connected is usually associated with undirected graphs (two way edges): there is a path
between every two nodes.
Strongly connected is usually associated with directed graphs (one way edges): there is a
route between every two nodes.
Complete graphs are undirected graphs where there is an edge between every pair of
nodes.




Connected component is the maximal connected sub-graph of an unconnected graph.
Following is a graph with three connected component.

Strongly connected components - if the directed graph is not strongly connected, then it has
strongly connected components where each of these strongly connected components are made up
of a subset of vertices from V, and each component is strongly connected.


Main Methods of the Graph ADT

Constructor operations to construct a graph:
i. createGraph( )
// post: creates an empty graph
ii. addNode(newItem)
// post: adds newItem in a graph. No change if newItem already exists.
iii. addEdge(Node1, Node2)
// post: adds an edge between two existing nodes (Node1 and Node2)

Predicate operations to test graph
i. isEmpty( )
// post: determines whether a graph is empty
i i. isLink(Node1, Node2 )
// post: returns true if edge (Node1, Node2) is present in the graph

Selector operations to select items of a queue
i. deleteNode( Node)
// post: remove a node and any edges between this node and other nodes.
ii. deleteEdge(Node1, Node2 )
// post: delete the edge between the two given nodes Node1, Node2.







Graph Representations

While several representations for graphs are possible, we shall study only the three most
commonly used: adjacency matrices, adjacency lists and adjacency multi-lists. Once again, the
choice of a particular representation will depend upon the application one has in mind and the
functions one expects to perform on the graph.

Adjacency Matrix
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let
the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j.
Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to
represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with
weight w.
[A weighted graph is a digraph which has values attached to the directed edges. These values
represent the cost of travelling from one node to the next. The cost can be measured in many
terms, depending upon the application. For instance, the 3 digraphs below all represent the same
thing; a graph of airport terminals and flight lines between those terminals. The only difference is
the meaning of the weights of the edges: Graph A represents the flight distance between
terminals; graph B represents the average flight time in minutes; and graph C represents the
dollar cost of a plane ticket between the hubs. Looking at graphs A and B, you can see that the
relative cost of a direct flight from hub 1 to 3 is cheaper in terms of time and miles than the more
roundabout route of travel from hub 1 to 2 to 3. However, in graph C, the ticket costs of a
roundabout route are cheaper than the direct flight.]

The adjacency matrix representations for different graphs are as follows.







Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time.
Queries like whether there is an edge from vertex u to vertex v are efficient and can be done
O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse (contains less number of
edges), it consumes the same space. Adding a vertex is O(V^2) time.
Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the array be
array[]. An entry array[i] represents the linked list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
stored in nodes of linked lists. Following is adjacency list representation of the following graph.




Pros: Saves space O(|V|+|E|) . In the worst case, there can be C(V, 2) number of edges in a graph
thus consuming O(V^2) space. Adding a vertex is easier.
Cons: Queries like whether there is an edge from vertex u to vertex v are not efficient and can be
done O(V).


Inverse Adjacency List
By name itself, it looks something opposite of adjacency list. So, in the inverse adjacency list,
again the graph is represented by n linked lists, but here the nodes in the list i represents the
vertices which are adjacent from (not to) the vertex i. This representation is used to find in-
degree of a node.






Adjacency Multi-lists
In the adjacency list representation of an undirected graph each edge (vi,vj) is represented by two
entries, one on the list for vi and the other on the list for vj. As we shall see, in some situations it
is necessary to be able to determine the second entry for a particular edge and mark that edge as
already having been examined. This can be accomplished easily if the adjacency lists are actually
there will be exactly one node, but this node will be in two lists, i.e., the adjacency lists for each
of the two nodes it is incident to. The node structure now becomes where


M is a one bit mark field that may be used to indicate whether or not the edge has been
examined. The storage requirements are the same as for normal adjacency lists except for the
addition of the mark bit M.



Graph Traversal

BFS and DFS algorithm are to be covered from Fundamentals of Data Structure in C by
Horowitz and Sahni


Application of DFS

Depth-first search (DFS) is an algorithm (or technique) for traversing a graph.
Following are the problems that use DFS as a building block
1) For an unweighted graph, DFS traversal of the graph produces the minimum spanning
tree and all pair shortest path tree
2) Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
i) Call DFS(G, u) with u as the start vertex.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path as the
contents of the stack
3) Topological Sorting
(https://www.cs.usfca.edu/~galles/visualization/TopoSortDFS.html)
i) Call DFS(G, u) with u as the start vertex.
ii) Note starting time of a node explored for each recursive call.
iii) Mark finish time when function call returns for each node and add node at the front of
list.
iv) Call DFS for all unvisited manner and repeat step i) to iii).
v) Output list.
4) Solving puzzles with only one solution, such as mazes.
5) Finding Strongly Connected Components of a graph
A directed graph is strongly connected if there is a path between all pairs of vertices. A
strongly connected component (SCC) of a directed graph is a maximal strongly
connected subgraph. For example, there are 3 SCCs in the following graph.

We can find all strongly connected components in O(V+E) time using Kosarajus
algorithm. Following is detailed Kosarajus algorithm.
1) Create an empty stack S and do DFS traversal of a graph. In DFS traversal, after
calling recursive DFS for adjacent vertices of a vertex, push the vertex to stack.
2) Reverse directions of all arcs to obtain the transpose graph.
3) One by one pop a vertex from S while S is not empty. Let the popped vertex be v.
Take v as source and do DFS. The DFS starting from v prints strongly connected
component of v.

Application of BFS
Breadth-first search (BFS) is an algorithm (or technique) for traversing a graph.
Following are the problems that use BFS as a building block
1) Finding Strongly Connected Components of a graph
// Apply same tech described above, except use bfs algorithm instead of dfs

2) FordFulkerson method for computing the maximum flow in a flow network
3) Testing bipartiteness
A bipartite graph, also called a bigraph, is a set of graph vertices decomposed into
two disjoint sets such that no two graph vertices within the same set are adjacent.
OR
A Bipartite Graph is a graph whose vertices can be divided into two independent
sets, U and V such that every edge (u, v) either connects a vertex from U to V or a vertex
from V to U. In other words, for every edge (u, v), either u belongs to U and v to V, or u
belongs to V and v to U. We can also say that there is no edge that connects vertices of
same set.














Following is a simple algorithm to find out whether a given graph is Birpartite or
not using Breadth First Search (BFS).
1. Assign RED color to the source vertex (putting into set U).
2. Color all the neighbors with BLUE color (putting into set V).
3. Color all neighbors neighbor with RED color (putting into set U).
4. This way, assign color to all vertices such that it satisfies all the constraints of m way
coloring problem where m = 2.
5. While assigning colors, if we find a neighbor which is colored with same color as
current vertex, then the graph cannot be colored with 2 vertices (or graph is not Bipartite)

4) Finding single source shortest path based on path length.


Note: Now, discuss BFS and DFS advantage and disadvantage and common
differences.

You might also like