Engineering

DAGs and Topological Sort

How acyclic structure enables ordered execution — and the two algorithms that exploit it

Learning Objectives

By the end of this module you will be able to:

  • Define a Directed Acyclic Graph (DAG) and identify one in a diagram.
  • State and explain why topological sort exists if and only if the graph is a DAG.
  • Execute Kahn's algorithm on a small DAG and explain the role of in-degree tracking.
  • Execute the DFS-based topological sort algorithm and explain the postorder reversal step.
  • State the time and space complexity of both algorithms.
  • Identify at least three real-world software engineering problems that reduce to topological ordering.

Core Concepts

Directed Acyclic Graphs

A Directed Acyclic Graph (DAG) is a directed graph where following the directions of the edges will never form a closed loop. Every edge points from one vertex to another, and no sequence of edges ever leads back to a vertex already visited. That single constraint — no cycles — is what makes DAGs special and practically powerful.

Fig 1
A B C D E F
A DAG with six vertices and no directed cycles. Arrows represent dependency direction.

DAGs differ from general directed graphs only in this one property, but that property unlocks a great deal: it becomes possible to establish a globally consistent ordering over all vertices, something cycles fundamentally prevent.

Topological Ordering

Topological sorting produces a linear ordering of all vertices such that for every directed edge u → v, vertex u appears before vertex v in the ordering. The ordering respects the entire dependency structure of the graph: nothing comes before something it depends on.

For every directed edge u → v, u must appear before v in the topological order.

A DAG can have multiple valid topological orderings (whenever two vertices have no dependency relationship between them, either can come first). The algorithms below find one valid ordering — not necessarily the unique one.

The If-and-Only-If Condition

A topological ordering exists if and only if the graph is a DAG. This is not just a practical observation — it is a mathematical necessity.

The reason cycles break topological ordering is straightforward: in a cycle u → v → w → u, you would need u before v, v before w, and w before u simultaneously. Those three constraints are mutually contradictory. No linear sequence can satisfy all of them at once. As soon as a cycle exists, topological ordering becomes impossible by the transitivity of the ordering constraint.

Conversely, any DAG has at least one valid topological ordering, and it can be computed in linear time. This guarantee is what makes DAGs so useful: you can always find a safe execution order.

Connection to the previous module

Cycle detection and topological sorting are dual problems. If you run a topological sort and it succeeds, the graph is acyclic. If it fails partway through, a cycle exists. Both Kahn's algorithm and the DFS approach expose this duality directly.


Step-by-Step Procedure

Algorithm 1: Kahn's Algorithm (BFS-based)

Kahn's algorithm builds the topological order by repeatedly identifying and removing vertices that have no remaining prerequisites — vertices whose in-degree has dropped to zero.

In-degree is the count of incoming edges for a vertex. A vertex with in-degree zero has no dependencies: nothing must come before it.

Steps:

  1. Compute the in-degree for every vertex.
  2. Enqueue all vertices with in-degree zero into a queue.
  3. While the queue is not empty:
    • Dequeue a vertex u. Append u to the result list.
    • For each neighbor v of u, decrement v's in-degree by one.
    • If v's in-degree reaches zero, enqueue v.
  4. If the result list contains all vertices, you have a valid topological order.
  5. If the result list is shorter than the total number of vertices, the remaining vertices are part of a cycle — the graph is not a DAG.

Decision point at step 3: when multiple vertices have in-degree zero simultaneously, any of them may be dequeued next. This is where multiple valid orderings arise.

Complexity: Θ(V + E) — every vertex is enqueued and dequeued exactly once, and every edge is examined exactly once when decrementing in-degrees.


Algorithm 2: DFS-based Topological Sort

The DFS-based approach uses the postorder finish time of depth-first search. A vertex is added to the result only after all vertices reachable from it have been fully explored — meaning all its dependencies in the output direction have already been placed.

Steps:

  1. Mark all vertices as unvisited.
  2. For each unvisited vertex u (in any order):
    • Recursively visit all unvisited neighbors of u (DFS).
    • After returning from all recursive calls, push u onto a stack.
  3. Pop all vertices from the stack to produce the topological order.

Why does postorder reversal work? A vertex u is pushed to the stack only after every vertex reachable from u has already been pushed. So when you reverse the stack, u appears before all vertices it points to. This is exactly the topological property.

Decision point: if the graph has a cycle, DFS will encounter a back edge — an edge pointing to an ancestor still on the recursion stack. You can detect this by tracking a "currently in stack" flag per vertex separately from the "visited" flag.

Complexity: O(V + E) time, O(V) auxiliary space for the visited array and the stack. Every vertex and edge is visited exactly once.


Worked Example

Consider this DAG representing a simplified build system:

compile-utils --> compile-core --> link --> package
                                /
         compile-tests ---------

Edges: compile-utils → compile-core, compile-core → link, compile-tests → link, link → package.

Kahn's algorithm walkthrough:

StepIn-degreesQueueOutput
Initcompile-utils=0, compile-core=1, compile-tests=0, link=2, package=1[compile-utils, compile-tests][]
Dequeue compile-utilscompile-core drops to 0[compile-tests, compile-core][compile-utils]
Dequeue compile-testslink drops to 1[compile-core][compile-utils, compile-tests]
Dequeue compile-corelink drops to 0[link][compile-utils, compile-tests, compile-core]
Dequeue linkpackage drops to 0[package][..., link]
Dequeue package[][..., package]

Final order: compile-utils → compile-tests → compile-core → link → package

All five vertices are in the output, so no cycle exists. This is a valid build order.

DFS walkthrough (same graph):

Starting DFS from compile-utils:

  • Visit compile-utils → recurse into compile-core
    • Visit compile-core → recurse into link
      • Visit link → recurse into package
        • Visit package → no neighbors → push package
      • Return → push link
    • Return → push compile-core
  • Return → push compile-utils
  • Now visit compile-tests (unvisited) → link already visited → push compile-tests

Stack (bottom to top): package, link, compile-core, compile-utils, compile-tests

Popped order: compile-tests, compile-utils, compile-core, link, package

Both are valid topological orderings of the same DAG.


Annotated Case Study

npm's Dependency Resolver

When you run npm install in a Node.js project, the package manager must determine which packages to install and in what order. This is a direct application of DAG-based dependency resolution.

The graph: packages are vertices. A directed edge from package A to package B means "A depends on B" — B must be installed before A can work.

Why the graph must be a DAG: if package A depends on B and B depends on A, neither can be installed first. This is a cycle, and it makes the dependency graph unresolvable. Package managers enforce DAG structure by rejecting or flagging circular dependencies.

What topological sort produces: a valid installation sequence. Every package is installed only after all its own dependencies are already available. Without a topological ordering, the installer would risk installing a package before its prerequisites exist.

Where it gets more complex: real package managers handle version constraints, optional dependencies, and peer dependencies — but the core ordering problem reduces to topological sort on a DAG. The same pattern appears in Python's pip, Rust's cargo, and Java's Maven.

Git's commit graph

Git stores commit history as a DAG: each commit points to its parent commit(s). This is why git log can traverse history in a consistent order, and why merges create vertices with two parents. There are no cycles because you cannot be your own ancestor.


Compare & Contrast

Kahn's Algorithm vs. DFS-based Topological Sort

DimensionKahn's AlgorithmDFS-based Sort
Core mechanismBFS, in-degree trackingDFS, postorder stack
Time complexityΘ(V + E)O(V + E)
Space complexityO(V) for queue + in-degree arrayO(V) for stack + visited array
Cycle detectionNatural: leftover vertices with non-zero in-degree signal a cycleRequires explicit back-edge detection via a "in-stack" flag
Output orderProcesses vertices with zero in-degree first — "sources first"Processes deepest vertices first, reverses — "sinks first, then reversed"
Multiple valid orderingsQueue ordering controls which valid ordering is producedDFS start order and adjacency list order affect the result
Parallelism insightVertices in the queue at the same step can run concurrentlyDoes not directly surface concurrent candidates
Preferred whenYou need to expose which tasks can run in parallel, or you want explicit cycle detection integrated into the sortYou already have a DFS framework and want to extend it with minimal code
A common confusion

Both algorithms produce valid topological orderings, but they will generally produce different orderings from each other — and neither ordering is "more correct." Any ordering satisfying the topological property is valid.


Active Exercise

Work through the following exercises without referring back to the procedures above.

Exercise 1 — Kahn's algorithm

Given this DAG with vertices {A, B, C, D, E} and edges: A→C, B→C, B→D, C→E, D→E:

  1. Compute the in-degree for each vertex.
  2. Run Kahn's algorithm step by step, showing the queue and output list at each step.
  3. Write down one valid topological ordering.
  4. Is there another valid ordering? Write it down if so.

Exercise 2 — DFS-based sort

Using the same graph above:

  1. Start DFS from A, then B (if unvisited after A's traversal is done).
  2. Record the order in which vertices are pushed to the stack.
  3. Pop the stack to produce the topological order.
  4. Does it match the ordering from Exercise 1? Should it?

Exercise 3 — Cycle identification

Suppose you add the edge E→B to the graph from Exercise 1.

  1. Run Kahn's algorithm. What happens?
  2. Which vertices remain with non-zero in-degree at the end?
  3. What do those vertices have in common?

Exercise 4 — Design problem

You are building a task runner for a CI pipeline. Tasks are defined in a config file with depends_on fields listing other task names. Describe in plain language (no code required) how you would:

  1. Construct a graph from the config.
  2. Validate that the config has no circular dependencies.
  3. Determine a valid execution order.
  4. Identify which tasks can be started in parallel at any given point.

Key Takeaways

  1. A Directed Acyclic Graph is a directed graph with no cycles. The absence of cycles is the defining property that makes DAGs tractable for ordering problems.
  2. Topological sorting exists if and only if the graph is a DAG. Cycles make topological ordering mathematically impossible — any cycle creates contradictory ordering constraints.
  3. Kahn's algorithm (BFS-based) repeatedly removes zero-in-degree vertices. It produces the order iteratively and detects cycles naturally when vertices remain at the end.
  4. DFS-based topological sort places each vertex onto a stack after fully exploring its descendants. Then it reverses the stack. Both algorithms run in O(V + E) time.
  5. Real-world applications include build systems, package dependency resolution, task scheduling, data pipelines, and Git commit graphs. These cover any domain where things must happen in a dependency-respecting sequence.

Further Exploration

Foundational Theory

Algorithm Walkthroughs

Interactive & Advanced