The Minimum Spanning Tree of an Undirected Graph 

A graph often contains redundancy in that there can be multiple paths between two vertices. This redundancy may be desirable, for example to offer alternative routes in the case of breakdown or overloading of an edge (road, connection, phone line) in a network. However, we often require the cheapest subnetwork that connects the vertices of a given graph. This must in fact be an unrooted tree, because there is only one path between any two vertices in a tree; if there is a cycle then at least one edge can be removed. The total cost or weight of a tree is the sum of the weights of the edges in the tree. We assume that the weight of every edge is greater than zero. Given a connected, undirected graph G=<V,E>, the minimum spanning tree problem is to find a tree T=<V,E'> such that E' subset_of E and the cost of T is minimal. Note that a minimum spanning tree is not necessarily unique. Recall that a tree over V vertices contains V1 edges. A tree can be represented by an array of this many edges. Note on TreesAn unrooted tree is a connected, acyclic, undirected graph. Each leaf vertex has degree one, i.e. is connected to one other vertex. An unrooted tree can be made into a rooted tree: If the unrooted tree is "floppy" and it is "picked up" by a leaf to make a root, the new root has one child, every internal vertex has at least one child, and every (other) leaf has no children. If all internal vertices of the unrooted tree have degree three, then the corresponding rooted tree is a binary tree. If the unrooted tree is picked up by an internal vertex, the new root has three children, every other internal vertex has two children, and every leaf has no children. Prim's AlgorithmPrim's algorithm for the minimum spanning tree problem follows the strategy of beginning with a small tree, i.e. <{v_{1}},{ }>, and growing it until it includes all vertices in the given graph. Initially the tree contains just an arbitrary starting node v_{1}. At each stage a vertex not yet in the tree but closest to (some vertex that is in) the tree is found. This closest vertex is added to the tree. Finding the vertex allows us to improve our knowledge of the distances of the remaining vertices to the tree. A set `done' contains the vertices in the tree.
Graph G = <V, E>
done := {v_{1}} initial Tree is <{v_{1}},{}>
for vertex i in V{v_{1}}
T[i] := E[1,i] direct edges (possibly "missing")
end for
loop V1 times
Inv: {T[v]v in done} represents a min' spanning Tree
 of the nodes in done and
 {T[u]u not in done} contains the shortest known
 distances from the (sub)spanning Tree to
 such vertices u.
find closest vertex to (sub)spanning Tree in V  done
done +:= {closest}
add closest & edge T[closest] to (sub)spanning Tree
for vertex j in V  done
T[j] := min(T[j], update knowledge on paths,
E[closest,j]) perhaps better?
end for
end loop
 Prim's Minimum Spanning Tree Algorithm 
ComplexityPrim's algorithm is very similar to Dijkstra's singlesource shortest path algorithm and indeed the given version of the former was created by simple edits to the latter. The principal difference is that the criteria for choosing a new vertex is proximity to the tree rather than to a source. Prim's algorithm also clearly takes O(V^{2}) time. CorrectnessPrim's algorithm is correct because at each stage it has built a minimum spanning tree over those vertices in the set `done' which eventually contains all the vertices: (i) The condition is trivially true initially. (ii) Consider the addition of a "closest" vertex `v' by an edge `e' to the partial spanning tree T at some intermediate stage. Vertex v must be in the final minimum spanning tree T' of all G. Suppose v could be connected into T' by some other path, not requiring e, for a smaller total cost. Now add e to T'. This would create a cycle through part of T. The cycle could be broken by deleting an edge out of T into the rest of T' of higher weight than e, because e is chosen as the lowest cost edge out of T to a vertex not in done. Thus T' could not be a minimum spanning tree of G, i.e. a contradiction, so the supposition is false. Kruskal's AlgorithmKruskal's algorithm for the minimum spanning tree problem begins with many disjoint spanning trees, a spanning forest. It repeatedly joins two trees together until a spanning tree of the entire given graph remains.
G = <V,E>
P := {{v_{1}}, ..., {v_{n}}} partition V into singleton trees
E' := {}
loop V1 times
Inv: E' contains only edges of a min' span' tree for each S in P &
 each S in P represents a subtree of a minimum spanning tree of G
find shortest edge e joining different subsets S1 and S2 in P
E' += {e}
P := P  {S1,S2} + {S1 union S2}
end loop
 Kruskal's Minimum Spanning Tree Algorithm, O(E*log(E)) time 
CorrectnessKruskal's algorithm certainly leads to a spanning tree T but it is necessary to prove that T is minimal. The invariant shows that this is so: The invariant is certainly true on the initial iteration. In the body of the loop, two partial minimum spanning trees T1 and T2 are joined by an edge `e'. The vertices in T1 and T2 must be connected somehow in a final minimum spanning tree T'. Suppose they could be connected more cheaply in T' than in T. Add e to T'. This would create a cycle but it could be broken by removing an edge from T to T'T. This would leave the vertices in T1 and T2 connected but at a lower cost than in T' because e is chosen as the cheapest satisfactory edge. Therefore T1, T2 and e form a minimum spanning subtree which must be part of a minimum spanning tree of G. The invariant really is invariant. The algorithm continues to join subtrees until a single minimum cost tree remains, spanning all vertices. ComplexityThe time complexity of Kruskal's algorithm hinges on finding the shortest edge to join two subtrees and on the joining itself. A priority queue can be used to organise the edges. A heap is a suitable implementation; see heapsort but remember that we require fast access to the smallest edge. The priority queue takes O(ElogE) time to create and O(logE) time to find a shortest edge while maintaining the priority queue. The later is done V1 times. The joining of two subtrees can be done in O(VlogV) total time over all the iterations as seen below. Thus Kruskal's algorithm takes O(ElogE) time overall. This is better than Prim's algorithm for sparse graphs for which E<<V^{2}.
The forest is represented by a partition of the vertices of the graph. Each partial tree spans a subset of the vertices. An array gives the size and first member of each subset. A second array gives the subset of each member and links the members of a subset together.
DemonstrationThe HTML FORM below allows a random edgeweighted, undirected graph with V vertices to be created, manipulated and analysed. The value `pr' is the probability of there being an edge <i,j>; it controls the sparseness of the graph and on average there will be pr*V*(V1) edges: controls:Note that the edges, (v_{i},v_{j}), in the tree produced by Prim's algorithm are sorted on v_{j}, because the algorithm works by considering how best to incorporate each v_{j} into the tree, gradually refining this knowledge (the tree starts as <{v_{1}},{ }>). On the other hand, the edges in the tree produced by Kruskal's algorithm are sorted by their weight, because it considers short edges first. Notes


↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso88591, fetched Tuesday, 21Sep2021 23:05:49 EDT. Free: Linux, Ubuntu operatingsys, OpenOffice officesuite, The GIMP ~photoshop, Firefox webbrowser, FlashBlock flash on/off. 