DAGs and Topological Sort
How acyclic structure enables ordered execution — and the two algorithms that exploit it
Learning Objectives
By the end of this module you will be able to:
- Define a Directed Acyclic Graph (DAG) and identify one in a diagram.
- State and explain why topological sort exists if and only if the graph is a DAG.
- Execute Kahn's algorithm on a small DAG and explain the role of in-degree tracking.
- Execute the DFS-based topological sort algorithm and explain the postorder reversal step.
- State the time and space complexity of both algorithms.
- Identify at least three real-world software engineering problems that reduce to topological ordering.
Core Concepts
Directed Acyclic Graphs
A Directed Acyclic Graph (DAG) is a directed graph where following the directions of the edges will never form a closed loop. Every edge points from one vertex to another, and no sequence of edges ever leads back to a vertex already visited. That single constraint — no cycles — is what makes DAGs special and practically powerful.
DAGs differ from general directed graphs only in this one property, but that property unlocks a great deal: it becomes possible to establish a globally consistent ordering over all vertices, something cycles fundamentally prevent.
Topological Ordering
Topological sorting produces a linear ordering of all vertices such that for every directed edge u → v, vertex u appears before vertex v in the ordering. The ordering respects the entire dependency structure of the graph: nothing comes before something it depends on.
For every directed edge u → v, u must appear before v in the topological order.
A DAG can have multiple valid topological orderings (whenever two vertices have no dependency relationship between them, either can come first). The algorithms below find one valid ordering — not necessarily the unique one.
The If-and-Only-If Condition
A topological ordering exists if and only if the graph is a DAG. This is not just a practical observation — it is a mathematical necessity.
The reason cycles break topological ordering is straightforward: in a cycle u → v → w → u, you would need u before v, v before w, and w before u simultaneously. Those three constraints are mutually contradictory. No linear sequence can satisfy all of them at once. As soon as a cycle exists, topological ordering becomes impossible by the transitivity of the ordering constraint.
Conversely, any DAG has at least one valid topological ordering, and it can be computed in linear time. This guarantee is what makes DAGs so useful: you can always find a safe execution order.
Cycle detection and topological sorting are dual problems. If you run a topological sort and it succeeds, the graph is acyclic. If it fails partway through, a cycle exists. Both Kahn's algorithm and the DFS approach expose this duality directly.
Step-by-Step Procedure
Algorithm 1: Kahn's Algorithm (BFS-based)
Kahn's algorithm builds the topological order by repeatedly identifying and removing vertices that have no remaining prerequisites — vertices whose in-degree has dropped to zero.
In-degree is the count of incoming edges for a vertex. A vertex with in-degree zero has no dependencies: nothing must come before it.
Steps:
- Compute the in-degree for every vertex.
- Enqueue all vertices with in-degree zero into a queue.
- While the queue is not empty:
- Dequeue a vertex u. Append u to the result list.
- For each neighbor v of u, decrement v's in-degree by one.
- If v's in-degree reaches zero, enqueue v.
- If the result list contains all vertices, you have a valid topological order.
- If the result list is shorter than the total number of vertices, the remaining vertices are part of a cycle — the graph is not a DAG.
Decision point at step 3: when multiple vertices have in-degree zero simultaneously, any of them may be dequeued next. This is where multiple valid orderings arise.
Complexity: Θ(V + E) — every vertex is enqueued and dequeued exactly once, and every edge is examined exactly once when decrementing in-degrees.
Algorithm 2: DFS-based Topological Sort
The DFS-based approach uses the postorder finish time of depth-first search. A vertex is added to the result only after all vertices reachable from it have been fully explored — meaning all its dependencies in the output direction have already been placed.
Steps:
- Mark all vertices as unvisited.
- For each unvisited vertex u (in any order):
- Recursively visit all unvisited neighbors of u (DFS).
- After returning from all recursive calls, push u onto a stack.
- Pop all vertices from the stack to produce the topological order.
Why does postorder reversal work? A vertex u is pushed to the stack only after every vertex reachable from u has already been pushed. So when you reverse the stack, u appears before all vertices it points to. This is exactly the topological property.
Decision point: if the graph has a cycle, DFS will encounter a back edge — an edge pointing to an ancestor still on the recursion stack. You can detect this by tracking a "currently in stack" flag per vertex separately from the "visited" flag.
Complexity: O(V + E) time, O(V) auxiliary space for the visited array and the stack. Every vertex and edge is visited exactly once.
Worked Example
Consider this DAG representing a simplified build system:
compile-utils --> compile-core --> link --> package
/
compile-tests ---------
Edges: compile-utils → compile-core, compile-core → link, compile-tests → link, link → package.
Kahn's algorithm walkthrough:
| Step | In-degrees | Queue | Output |
|---|---|---|---|
| Init | compile-utils=0, compile-core=1, compile-tests=0, link=2, package=1 | [compile-utils, compile-tests] | [] |
| Dequeue compile-utils | compile-core drops to 0 | [compile-tests, compile-core] | [compile-utils] |
| Dequeue compile-tests | link drops to 1 | [compile-core] | [compile-utils, compile-tests] |
| Dequeue compile-core | link drops to 0 | [link] | [compile-utils, compile-tests, compile-core] |
| Dequeue link | package drops to 0 | [package] | [..., link] |
| Dequeue package | — | [] | [..., package] |
Final order: compile-utils → compile-tests → compile-core → link → package
All five vertices are in the output, so no cycle exists. This is a valid build order.
DFS walkthrough (same graph):
Starting DFS from compile-utils:
- Visit
compile-utils→ recurse intocompile-core- Visit
compile-core→ recurse intolink- Visit
link→ recurse intopackage- Visit
package→ no neighbors → pushpackage
- Visit
- Return → push
link
- Visit
- Return → push
compile-core
- Visit
- Return → push
compile-utils - Now visit
compile-tests(unvisited) →linkalready visited → pushcompile-tests
Stack (bottom to top): package, link, compile-core, compile-utils, compile-tests
Popped order: compile-tests, compile-utils, compile-core, link, package
Both are valid topological orderings of the same DAG.
Annotated Case Study
npm's Dependency Resolver
When you run npm install in a Node.js project, the package manager must determine which packages to install and in what order. This is a direct application of DAG-based dependency resolution.
The graph: packages are vertices. A directed edge from package A to package B means "A depends on B" — B must be installed before A can work.
Why the graph must be a DAG: if package A depends on B and B depends on A, neither can be installed first. This is a cycle, and it makes the dependency graph unresolvable. Package managers enforce DAG structure by rejecting or flagging circular dependencies.
What topological sort produces: a valid installation sequence. Every package is installed only after all its own dependencies are already available. Without a topological ordering, the installer would risk installing a package before its prerequisites exist.
Where it gets more complex: real package managers handle version constraints, optional dependencies, and peer dependencies — but the core ordering problem reduces to topological sort on a DAG. The same pattern appears in Python's pip, Rust's cargo, and Java's Maven.
Git stores commit history as a DAG: each commit points to its parent commit(s). This is why git log can traverse history in a consistent order, and why merges create vertices with two parents. There are no cycles because you cannot be your own ancestor.
Compare & Contrast
Kahn's Algorithm vs. DFS-based Topological Sort
| Dimension | Kahn's Algorithm | DFS-based Sort |
|---|---|---|
| Core mechanism | BFS, in-degree tracking | DFS, postorder stack |
| Time complexity | Θ(V + E) | O(V + E) |
| Space complexity | O(V) for queue + in-degree array | O(V) for stack + visited array |
| Cycle detection | Natural: leftover vertices with non-zero in-degree signal a cycle | Requires explicit back-edge detection via a "in-stack" flag |
| Output order | Processes vertices with zero in-degree first — "sources first" | Processes deepest vertices first, reverses — "sinks first, then reversed" |
| Multiple valid orderings | Queue ordering controls which valid ordering is produced | DFS start order and adjacency list order affect the result |
| Parallelism insight | Vertices in the queue at the same step can run concurrently | Does not directly surface concurrent candidates |
| Preferred when | You need to expose which tasks can run in parallel, or you want explicit cycle detection integrated into the sort | You already have a DFS framework and want to extend it with minimal code |
Both algorithms produce valid topological orderings, but they will generally produce different orderings from each other — and neither ordering is "more correct." Any ordering satisfying the topological property is valid.
Active Exercise
Work through the following exercises without referring back to the procedures above.
Exercise 1 — Kahn's algorithm
Given this DAG with vertices {A, B, C, D, E} and edges: A→C, B→C, B→D, C→E, D→E:
- Compute the in-degree for each vertex.
- Run Kahn's algorithm step by step, showing the queue and output list at each step.
- Write down one valid topological ordering.
- Is there another valid ordering? Write it down if so.
Exercise 2 — DFS-based sort
Using the same graph above:
- Start DFS from A, then B (if unvisited after A's traversal is done).
- Record the order in which vertices are pushed to the stack.
- Pop the stack to produce the topological order.
- Does it match the ordering from Exercise 1? Should it?
Exercise 3 — Cycle identification
Suppose you add the edge E→B to the graph from Exercise 1.
- Run Kahn's algorithm. What happens?
- Which vertices remain with non-zero in-degree at the end?
- What do those vertices have in common?
Exercise 4 — Design problem
You are building a task runner for a CI pipeline. Tasks are defined in a config file with depends_on fields listing other task names. Describe in plain language (no code required) how you would:
- Construct a graph from the config.
- Validate that the config has no circular dependencies.
- Determine a valid execution order.
- Identify which tasks can be started in parallel at any given point.
Key Takeaways
- A Directed Acyclic Graph is a directed graph with no cycles. The absence of cycles is the defining property that makes DAGs tractable for ordering problems.
- Topological sorting exists if and only if the graph is a DAG. Cycles make topological ordering mathematically impossible — any cycle creates contradictory ordering constraints.
- Kahn's algorithm (BFS-based) repeatedly removes zero-in-degree vertices. It produces the order iteratively and detects cycles naturally when vertices remain at the end.
- DFS-based topological sort places each vertex onto a stack after fully exploring its descendants. Then it reverses the stack. Both algorithms run in O(V + E) time.
- Real-world applications include build systems, package dependency resolution, task scheduling, data pipelines, and Git commit graphs. These cover any domain where things must happen in a dependency-respecting sequence.
Further Exploration
Foundational Theory
- Topological sorting — Wikipedia — Full treatment including the proof of the if-and-only-if condition and historical context
- Topologically Sorting a Directed Acyclic Graph (CLRS 22.4) — Lecture notes closely following the CLRS textbook treatment
Algorithm Walkthroughs
- Lecture 14: DFS and Topological Sort — MIT OpenCourseWare — MIT 6.006 lecture covering DFS-based topological sort with formal analysis
- Kahn's Algorithm — GeeksforGeeks — Worked implementation and step-by-step trace
- ICS 46 Topological Ordering Notes — UC Irvine — Clear classroom notes with annotated pseudocode for both algorithms
Interactive & Advanced
- DAGs and Topological Sort — NetworkX Notebooks — Interactive Python-based exploration of DAG algorithms
- Incremental Cycle Detection, Topological Ordering, and Strong Component Maintenance — ACM — Primary research on maintaining topological order dynamically as edges are added