   Properties of Graphs

Graph theory is the study of the properties of graph structures. It provides us with a language with which to talk about graphs.

The key to solving many problems is identifying the fundamental graph-theoretic notion underlying the situation and then using classical algorithms to solve the resulting problem.

Graphs are made up of vertices and edges. The simplest property of a vertex is its degree, the number of edges incident upon it.

The sum of the vertex degrees in any undirected graph is twice the number of edges, since every edge contributes one to the degree of both adjacent vertices.

Trees are undirected graphs which contain no cycles. Vertex degrees are important in the analysis of trees. A leaf of a tree is a vertex of degree 1. Every -vertex tree contains edges, so all non-trivial trees contain at least two leaf vertices.

Connectivity

A graph is connected if there is an undirected path between every pair of vertices.

The existence of a spanning tree is sufficient to prove connectivity. A breadth-first or depth-first search-based connected components algorithm can be used to find such a spanning tree.

However, there are other notions of connectivity. The most interesting special case when there is a single weak link in the graph. A single vertex whose deletion disconnects the graph is called an articulation vertex; any graph without such a vertex is said to be biconnected. A single edge whose deletion disconnects the graph is called a bridge.

Testing for articulation vertices or bridges is easy via brute force. For each vertex/edge, delete it from the graph and test whether the resulting graph remains connected. Be sure to add that vertex/edge back before doing the next deletion!

In directed graphs we are often concerned with strongly connected components, that is, partitioning the graph into chunks such that there are directed paths between all pairs of vertices within a given chunk. Road networks should be strongly connected, or else there will be places you can drive to but not drive home from without violating one-way signs.

Cycles in Graphs

All non-tree connected graphs contain cycles. Particularly interesting are cycles which visit all the edges or vertices of the graph.

An Eulerian cycle is a tour which visits every edge of the graph exactly once. A mailman's route is ideally an Eulerian cycle, so he can visit every street (edge) in the neighborhood once before returning home.

An undirected graph contains an Eulerian cycle if it is connected and every vertex is of even degree. Why? The circuit must enter and exit every vertex it encounters, implying that all degrees must be even.

We can find an Eulerian cycle by building it one cycle at a time. We can find a simple cycle in the graph by finding a back edge using DFS. Deleting the edges on this cycle leaves each vertex with even degree. Once we have partitioned the edges into edge-disjoint cycles, we can merge these cycles arbitrarily at common vertices to build an Eulerian cycle.

A Hamiltonian cycle is a tour which visits every vertex of the graph exactly once. The traveling salesman problem asks for the shortest such tour on a weighted graph.

Unfortunately, no efficient algorithm exists for solving Hamiltonian cycle problems. If the graph is sufficiently small, it can be solved via backtracking.

Minimum Spanning Trees

A spanning tree of a graph is a subset of edges from forming a tree connecting all vertices of .

For edge-weighted graphs, we are particularly interested in the minimum spanning tree, the spanning tree whose sum of edge weights is the smallest possible.

Minimum spanning trees are the answer whenever we need to connect a set of points (representing cities, junctions, or other locations) by the smallest amount of roadway, wire, or pipe.

We will present Prim's algorithm here because we think it is simpler to program, and because it gives us Dijkstra's shortest path algorithm with very minimal changes.

We generalize the graph data structure to support edge-weighted graphs. Each edge-entry previously contained only the other endpoint of the given edge. We must replace this by a record allowing us to annotate the edge with weights:


typedef struct {
int v;                          /* neighboring vertex */
int weight;                     /* edge weight */
} edge;

typedef struct {
edge edges[MAXV+1][MAXDEGREE];  /* adjacency info */
int degree[MAXV+1];             /* outdegree of vertex */
int nvertices;                  /* number of vertices */
int nedges;                     /* number of edges in graph */
} graph;


Prim's Algorithm

Prim's algorithm grows the minimum spanning tree in stages starting from a given vertex. At each iteration, we add one new vertex into the spanning tree. A greedy algorithm suffices for correctness: we always add the lowest-weight edge linking a vertex in the tree to a vertex on the outside.

Our implementation keeps track of the cheapest edge from any tree vertex to every non-tree vertex in the graph. The cheapest edge over all remaining non-tree vertices gets added in each iteration. We must update the costs of getting to the non-tree vertices after each insertion.

The minimum spanning tree itself or its cost can be reconstructed in two different ways. The simplest method would be to augment this procedure with statements that print the edges as they are found or total the weight of all selected edges in a variable for later return. Alternately, since the tree topology is encoded by the parent array it plus the original graph tells you everything about the minimum spanning tree.

Prim's Implementation


prim(graph *g, int start) {
int i,j;                        /* counters */
bool intree[MAXV];              /* is vertex in the tree yet? */
int distance[MAXV];             /* vertex distance from start */
int v;                          /* current vertex to process */
int w;                          /* candidate next vertex */
int weight;                     /* edge weight */
int dist;                       /* shortest current distance */

for (i=1; i<=g->nvertices; i++) {
intree[i] = FALSE;
distance[i] = MAXINT;
parent[i] = -1;
}
distance[start] = 0;
v = start;

while (intree[v] == FALSE) {
intree[v] = TRUE;
for (i=0; i<g->degree[v]; i++) {
w = g->edges[v][i].v;
weight = g->edges[v][i].weight;
if ((distance[w] > weight) && (intree[w]==FALSE)) {
distance[w] = weight;
parent[w] = v;
}
}

v = 1;
dist = MAXINT;
for (i=2; i<=g->nvertices; i++)
if ((intree[i]==FALSE) && (dist > distance[i])) {
dist = distance[i];
v = i;
}
}
}


Dijkstra's Algorithm for Shortest Paths

BFS does not suffice for finding shortest paths in weighted graphs, because the shortest weighted path from to does not necessarily contain the fewest number of edges.

Dijkstra's algorithm is the method of choice for finding the shortest path between two vertices in an edge- and/or vertex-weighted graph. Given a particular start vertex , it finds the shortest path from to every other vertex in the graph, including your desired destination .

The basic idea is very similar to Prim's algorithm. In each iteration, we are going to add exactly one vertex to the tree of vertices for which we know the shortest path from .

The difference between Dijkstra's and Prim's algorithms is how they rate the desirability of each outside vertex. In shortest path, we want to include the outside vertex which is closest (in shortest-path distance) to the start. This is a function of both the new edge weight and the distance from the start of the tree-vertex it is adjacent to.

In fact, this change is very minor. Below we give an implementation of Dijkstra's algorithm based on changing exactly three lines from our Prim's implementation - one of which is simply the name of the function!

Implementation of Dijkstra


dijkstra(graph *g, int start)           /* WAS prim(g,start) */
{
int i,j;                        /* counters */
bool intree[MAXV];              /* is vertex in the tree yet? */
int distance[MAXV];             /* vertex distance from start */
int v;                          /* current vertex to process */
int w;                          /* candidate next vertex */
int weight;                     /* edge weight */
int dist;                       /* shortest current distance */

for (i=1; i<=g->nvertices; i++) {
intree[i] = FALSE;
distance[i] = MAXINT;
parent[i] = -1;
}
distance[start] = 0;
v = start;

while (intree[v] == FALSE) {
intree[v] = TRUE;
for (i=0; i<g->degree[v]; i++) {
w = g->edges[v][i].v;
weight = g->edges[v][i].weight;
/* CHANGED */    if (distance[w] > (distance[v]+weight)) {
/* CHANGED */            distance[w] = distance[v]+weight;
parent[w] = v;
}
}
v = 1;
dist = MAXINT;
for (i=2; i<=g->nvertices; i++)
if ((intree[i]==FALSE) && (dist > distance[i])) {
dist = distance[i];
v = i;
}
}
}


How do we use dijkstra to find the length of the shortest path from start to a given vertex ? This is exactly the value of distance[t]. How can we reconstruct the actual path? By following the backward parent pointers from until we hit start (or -1 if no such path exists)

Unlike Prim's, Dijkstra's algorithm only works on graphs without negative-cost edges. Most applications do not feature negative-weight edges, making this discussion academic.

All-Pairs Shortest Path

Many applications need to know the length of the shortest path between all pairs of vertices in a given graph. For example, suppose you want to find the center'' vertex, the one which minimizes the longest or average distance to all the other nodes. This might be the best place to start a new business.

We could solve this problem by calling Dijkstra's algorithm from each of the possible starting vertices. But Floyd's all-pairs shortest-path algorithm is an amazingly slick way to construct this distance matrix from the original weight matrix of the graph.

Floyd's algorithm is best employed on an adjacency matrix data structure, which is no extravagance since we have to store all pairwise distances anyway. Our adjacency_matrix type allocates space for the largest possible matrix, and keeps track of how many vertices are in the graph:


typedef struct {
int weight[MAXV+1][MAXV+1];    /* adjacency/weight info */
int nvertices;                 /* number of vertices in graph */


A critical issue in any adjacency matrix implementation is how we denote the edges which are not present in the graph. For unweighted graphs, a common convention is that graph edges are denoted by and non-edges by . This gives exactly the wrong interpretation if the numbers denote edge weights, for the non-edges get interpreted as a free ride between vertices. Instead, we should initialize each non-edge to MAXINT.


{
int i,j;                      /* counters */

g -> nvertices = 0;

for (i=1; i<=MAXV; i++)
for (j=1; j<=MAXV; j++)
g->weight[i][j] = MAXINT;
}


Floyd's algorithm starts by numbering the vertices of the graph from to , using these numbers not to label the vertices but to order them.

We will perform iterations, where the th iteration allows only the first vertices as possible intermediate steps on the path between each pair of vertices and . When , we are allowed no intermediate vertices, so the only allowed paths consist of the original edges in the graph. Thus the initial all-pairs shortest-path matrix consists of the initial adjacency matrix. At each iteration, we allow a richer set of possible shortest paths. Allowing the th vertex as a new possible intermediary helps only if there is a short path that goes through , so Implementation of Floyd's Algorithm

The correctness of this is somewhat subtle, and we encourage you to convince yourself of it. But there is nothing subtle about how short and sweet the implementation is:


{
int i,j;              /* dimension counters */
int k;                /* intermediate vertex counter */
int through_k;        /* distance through vertex k */

for (k=1; k<=g->nvertices; k++)
for (i=1; i<=g->nvertices; i++)
for (j=1; j<=g->nvertices; j++) {
through_k = g->weight[i][k]+g->weight[k][j];
if (through_k < g->weight[i][j])
g->weight[i][j] = through_k;
}
}


Transitive Closure

Floyd's algorithm has another important application, that of computing the transitive closure of a directed graph. In analyzing a directed graph, we are often interested in which vertices are reachable from a given node.

For example, consider the blackmail graph defined on a set of people, where there is a directed edge if has sensitive-enough private information on so that can get him to do whatever he wants. You wish to hire one of these people to be your personal representative. Who has the most power in terms of blackmail potential?

A simplistic answer would be the vertex of highest degree, but an even better representative would be the person who has blackmail chains to the most other parties. Steve might only be able to blackmail Miguel directly, but if Miguel can blackmail everyone else then Steve is the man you want to hire.

The vertices reachable from any single node can be computed using using breadth-first or depth-first search. But the whole batch can be computed as an all-pairs shortest-path problem. If the shortest path from to remains MAXINT after running Floyd's algorithm, you can be sure there is no directed path from to . Any vertex pair of weight less than MAXINT must be reachable, both in the graph-theoretic and blackmail senses of the word.

Bipartite Matching and Network Flow

Any edge-weighted graph can be thought of as a network of pipes, where the weight of edge measures the capacity of the pipe. For a given weighted graph and two vertices and , the network flow problem asks for the maximum amount of flow which can be sent from to while respecting the maximum capacities of each pipe.

While the network flow problem is of independent interest, its primary importance is that of being able to solve other important graph problems. A classic example is bipartite matching. A matching in a graph is a subset of edges such that no two edges in share a vertex. Thus a matching pairs off certain vertices such that every vertex is in at most one such pair.

Graph is bipartite or two-colorable if the vertices can be divided into two sets, say, and , such that all edges in have one vertex in and one vertex in . Many naturally defined graphs are bipartite. For example, suppose certain vertices represent jobs to be done and the remaining vertices people who can potentially do them. The existence of edge means that job can potentially done by person . Or let certain vertices represent boys and certain vertices girls, with edges representing compatible pairs. Matchings in these graphs have natural interpretations as job assignments or as marriages.

The largest possible bipartite matching can be found using network flow. See the textbook for details.

Assigned Problems

111001 (Freckles) - Connect the dots using as little ink as possible. What classical graph problem does this correspond to?

111002 (The Necklace) - Does there exist a way to lace up bicolored beads so that each pair of neighboring bead-faces share a color? What classical graph problem does this correspond to? What efficiently computed classical graph problem does this also correspond to?

111006 (Tourist Guide) - What vertices in the graph separate the graph into two pieces, i.e. all paths between and must go through them for any and ? How can we efficiently test whether is such a vertex?

111007 (The Grand Dinner) - Match team members to tables so that no two team members sit at the same table. Can this be done using bipartite matching/network flow?   