From Math Images
- Kruskal’s Algorithm finds a minimum spanning tree in a connected graph with edge weights.
Basic DescriptionA minimum spanning tree (MST) is an acyclic sub-graph that connects all of the vertices in the graph with minimum total edge weight. This algorithm first sorts the edges by their edge weights. It adds each edge into the MST, starting with the one with the smallest weight, and after each addition checks to see if there is a cycle in the new tree. If there is a cycle, it removes the edge that it just added that made the cycle. Kruskal’s algorithm is guaranteed to find a minimum spanning tree, but it may not be the only MST possible for the graph.
A More Mathematical Explanation
- Note: understanding of this explanation requires: *Graphs, Pointers, Running Time
The Pseudo Code<span class="_togglegroup _toggle_initshow _toggle _toggler toggle-visible" style [...]
The Pseudo Code
The Underlying Data Structure
From looking through the algorithm, we can see that the first thing that happens is calling makeset() for each vertex. Translating this into running time we have . Next we sort the edges: . For each edge we have to call find on its two vertices, and then possibly call union. Thus the running time for this portion of code is . We know that we will only have to call union times, because after that all of the vertices are in the same set.
Makeset() only has to set the parent pointer and rank of the vertex, so it is a constant time operation.
The find() operation is a little more complicated. Because a rank k node is created by merging two trees that both have roots on rank k-1, we can say that there are at least 2k nodes in a tree of rank k. Since all rank k nodes have at least 2k descendants and no nodes of the same rank share descendants, we can say that there are no more than the total number of nodes over 2k nodes of each rank k. This means that the tree's height is no greater than .
Other than calling find, union() is a constant time operation. Find is called twice, but because of the properties of running times, constants do not matter. The running time for union is therefore the same as the running time for find.
Putting it all together we have a running time of . In an undirected graph there are at most edges, so E can be reduced to . Also, log(x2) can be reduced to 2*log(x), which is just log(x). This gives us a running time of .
Here we can see that find() and the initial sorting are the bottle neck of this running time. But what if the edges are already sorted by edge weights? We already know that we can improve the find function's running time with path compression. The value log*(n) is defined as how many times you must take the log of a number for it to be less than or equal to 1. log* time is essentially constant because log*(x) is only more than 5 if x > 265536. Using path compression, the running time of find is . So if our edges are already sorted, this makes the running time of the entire algorithm essentially linear.
Why it works
When each edge is added, to comply with the cut property, it must be the least weighted edge that connects two components that are otherwise not connected. Since the algorithm tries to add the edge with the lowest possible weight, it cannot break the first requirement. Also, if the components were connected already, there will be a cycle and the algorithm will not add the edge to the MST. We know that the edge that is already connecting the components has a smaller weight, because it was added first.
Why It's InterestingMinimum spanning trees can be useful in networking. In this case one wants all of the computers or phone lines to be connected to one another, but at a minimal cost. The weights on the edges therefore can represent a maintenance cost or usage time, whatever is being optimized.
- There are currently no teaching materials for this page. Add teaching materials.
Leave a message on the discussion page by clicking the 'discussion' tab at the top of this image page.