NoSQL Zone is brought to you in partnership with:

I've been working on web based projects built mainly with PHP and JavaScript, where I mostly use Zend Framework and jQuery. I am interested in any webpage optimizations techniques - for a faster web! Stoimen is a DZone MVB and is not an employee of DZone and has posted 96 posts at DZone. You can read more from them at their website. View Full User Profile

Algorithm of the Week: Topological Sort Revisited

12.11.2012
| 10073 views |
  • submit to reddit

Introduction

We already know what about the topological sort of a directed acyclic graph. So why do we need to revisit this algorithm? First of all I never mentioned its complexity, thus to understand why we do need a revision let’s get again on the algorithm.

We have a directed acyclic graph (DAG). There are no cycles so we must go for some kind of order putting all the vertices of the graph in such an order, that if there’s a directed edge (u, v), u must precede v in that order.

Topological Sort

 

The process of putting all the vertices of the DAG in such an order is called topological sorting. It’s commonly used in task scheduling or while finding the shortest paths in a DAG.

The algorithm itself is pretty simple to understand and code. We must start from the vertex (vertices) that don’t have predecessors.

Topological Sort - step 1

 

We put them in our sorted list in random order. Since they don’t depend on each other we can assume they are equally sorted already. Indeed thinking of a task schedule if there are tasks that don’t have predecessors (they don’t depend on other tasks before them) and that don’t depend on each other we can put them in random order (and execute them in random order).

Topological Sort - Order

 

Once we have the vertices with no predecessors we must remove the edges starting from them. Then – go again with the vertices with no predecessors.

Topological Sort - step 2

 

It’s as simple as that, so why do we need a revision of this algorithm? Well, basically because of its efficiency.

Overview

As we know most of the graph algorithms depend on the way the graph is represented in our application. We consider as the two main representations the adjacency matrix …

Adjacency Matrix

 

… and adjacency lists.

Adjacency Lists

 

Let’s first take a look of some of the main approaches to get the topologically sorted list at the end of the algorithm.

What can we do in order to find the vertices with no predecessors? We can only scan the entire list of vertices.

Adjacency Matrix

In case we’re using adjacency matrix we need|V|^2 space to store the graph. To find the vertices with no predecessors we have to scan the entire graph, which will cost us O(|V|^2) time. And we’ll have to do that |V| times. This will be |V|^3 time consuming algorithm and for dense graphs this will be quite an ineffective algorithm.

Adjacency Lists

What about the adjacency list? There we need |E| space to store a directed graph. How fast can we find a node with no predecessor? Practically we’ll need O(|E|) time. Thus in the worst case we have again O(|V|^2) time consuming programs.

So what can be done in order to optimize this algorithm?

Practically we can start by picking up a random vertex and “go back” until we get a node with no predecessors. This approach can be very effective yet also very ineffective. First of all if we have to scan all the way back to a node with no predecessors this will cost us |V| time, but if we stuck on a node that don’t have a preceding node then we’ll have a constant speed.

This means that we can modify the algorithm a bit in order to improve a lot the algorithm. We just need to store both incoming and outgoing edges and slightly modify the adjacency lists.

Modified Adjacency Lists

 

What’s the algorithm now?

First we easily find the nodes with no predecessors. Then, using a queue, we can keep the nodes with no predecessors and on each dequeue we can remove the edges from the node to all other nodes.

Pseudo Code

 Represent the graph with two lists on each vertex (incoming edges and outgoing edges)
 Make an empty queue Q;
 Make an empty topologically sorted list T;
 Push all items with no predecessors in Q;
 While Q is not empty
   a. Dequeue from Q into u;
   b. Push u in T;
   c. Remove all outgoing edges from u;
 Return T;

This approach will give us a better performance than the “brute force” approach. The running time complexity is O(|V| + |E|). The problem is that we need additional space and an operational queue, but this approach is a perfect example of how by using additional space you can get a better performing algorithm.



Published at DZone with permission of Stoimen Popov, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Boyko Bantchev replied on Wed, 2012/12/19 - 6:45am

The loop in your pseudo-code only removes nodes from the queue but never adds to it, as it should :)  If corrected, it will be an instance of a breadth-first graph traversal.
But in fact a better option is to do depth-first traversal.  Start with an empty list T, and for each node with no predecessors initiate a DF traversal.  Each time you abandon a node because it has no non-visited successors, add that node to the front of T.  When you are done, T is a topologically sorted list of the nodes of the graph.  This algorithm runs in linear time and needs no additional space except 1 bit per node to keep track of the visited nodes.

Allen Coin replied on Wed, 2012/12/19 - 9:42am in response to: Boyko Bantchev

 Good point; that is a good way of doing it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.