Engineering

Cycle Detection

Four techniques for catching loops in directed graphs, undirected graphs, and linked structures

Learning Objectives

By the end of this module you will be able to:

  • Explain why a back edge in DFS signals a cycle in a directed graph.
  • Apply the DFS white/gray/black coloring method to detect cycles in a directed graph.
  • Describe how parent tracking makes DFS cycle detection correct for undirected graphs.
  • Explain how Union-Find detects cycles in undirected graphs and when it is preferred over DFS.
  • Describe Floyd's tortoise-and-hare algorithm and the graph structures it is designed for.
  • State the time and space complexity of each detection technique.
  • Identify real engineering scenarios where cycle detection is required.

Core Concepts

Why cycles matter

A cycle in a graph means a path exists that loops back to its own starting point. This matters enormously in engineering because many problems reduce to establishing an ordering of nodes — and an ordering is only possible when no cycles exist. Cycle detection is a prerequisite for topological sort, dependency resolution, and deadlock diagnosis.

Concrete scenarios where cycle detection is required:

  • Build systems and package managers — circular dependencies between packages or build targets cannot be resolved.
  • Database migrations — migration scripts with mutual dependencies cannot be run in any safe order.
  • Circuit evaluation and compilation — feedback loops in a dependency graph indicate a problematic structural condition.
  • Deadlock detection — a cycle in a resource-allocation graph means a set of processes is permanently blocked.
  • Course prerequisite systems — a cycle means students can never satisfy the requirements to enroll.

In every case the cycle must be detected and surfaced to the user, because the underlying ordering problem becomes infeasible the moment a cycle exists.


The back edge: the key concept in DFS-based detection

When DFS traverses a directed graph it builds a DFS tree — the spanning subgraph formed by the edges it actually follows. Edges in the original graph that are not in the tree fall into categories. The most important is the back edge.

A back edge is an edge that points from a vertex to one of its ancestors in the DFS tree.

A back edge is the necessary and sufficient condition for a cycle in a directed graph. If any back edge exists, a cycle exists. If no back edge exists, the graph is a DAG.

The intuition: if you are in the middle of recursing into vertex u and you find an edge that points back to an ancestor you are still visiting, you have just traced a closed path — a cycle.


DFS coloring for directed graphs

The canonical way to track what "still being visited" means during DFS is the white/gray/black color scheme.

ColorMeaning
White (0)Not yet visited
Gray (1)Currently in the recursion stack
Black (2)Fully processed

A cycle exists if and only if DFS encounters a gray node during traversal. A gray node is one that is already on the active call stack, so reaching it again means there is a path from it to itself.

Gray = in-progress

White means unstarted. Gray means started but not finished — the vertex is an ancestor in the current DFS path. Black means finished. Only a gray encounter indicates a back edge.


DFS with parent tracking for undirected graphs

Undirected graphs require a different treatment. Because every undirected edge (u, v) can be traversed in both directions, a naive DFS would immediately see u as a "visited ancestor" when backtracking from v, producing a false positive on every single edge.

The fix is parent tracking. When DFS visits vertex v from vertex u, it records u as the parent. If during the traversal from v it encounters an already-visited neighbor that is not u, that neighbor is reachable via a second path — which proves a cycle. The parent edge itself is deliberately excluded from the check.

Direction matters

Directed and undirected cycle detection use different logic. Applying the directed (coloring) method to an undirected graph without modification will produce false positives on every traversed edge.


Union-Find for undirected graphs

Union-Find (also called Disjoint-Set Union) offers an alternative to DFS for undirected graphs. The approach works as follows:

  1. Initially, each vertex is its own component (its own parent).
  2. For each edge (u, v), find the root representative of the set containing u and the root of the set containing v.
  3. If both roots are the same, u and v are already connected — adding this edge would create a cycle.
  4. If the roots differ, merge the two sets (union them) and continue.

The key advantage is that Union-Find naturally handles an incremental stream of edges. You do not need the entire graph in memory; you can check each edge as it arrives. This makes it the preferred approach when edges are inserted one at a time and you need a cycle check after each insertion.


Floyd's tortoise and hare for implicit graphs

DFS and Union-Find both require an explicit graph representation — an adjacency list or edge list. Some structures do not expose themselves that way. A singly linked list, for example, gives you only a next pointer from each node. You cannot enumerate its edges in advance.

Floyd's cycle-finding algorithm handles exactly this case. It uses two pointers:

  • Tortoise — advances one node per step.
  • Hare — advances two nodes per step.

If a cycle exists, the hare will eventually lap the tortoise and the two pointers will meet at the same node. If the hare reaches a null (end of the list), no cycle exists.

The algorithm runs in O(n) time and O(1) space — it uses only two pointer variables regardless of how large the structure is. This is what makes it the right tool for linked lists and similar implicit sequential structures.

Locating the cycle's entry point. Floyd's algorithm can be extended to find where the cycle begins. Once the two pointers meet, reset one pointer to the head. Advance both pointers one step at a time. The node where they next meet is the entry point of the cycle. This extension preserves the O(n) time and O(1) space bounds.


Incremental cycle detection

In some production systems, the graph is not static — edges are inserted over time and you need to know at the moment of each insertion whether a cycle has been introduced. This is the incremental cycle detection problem.

Research has established algorithms that maintain a topological ordering and detect cycles as edges are added, achieving bounds such as O(m^(3/2)) total time for m edge insertions. Applications include circuit evaluation, compilation dependency tracking, and deadlock detection in live systems. Running a full DFS after every edge insertion would be correct but expensive; incremental algorithms amortize the cost across insertions.


Step-by-Step Procedure

Directed graph: DFS with coloring

  1. Initialize a color array of size V, all set to white (0).
  2. For each vertex that is still white, call dfs(vertex).
  3. In dfs(u):
    • Mark u gray.
    • For each neighbor v of u:
      • If v is gray → cycle detected, return true.
      • If v is white → recurse: dfs(v). If it returns true, propagate true upward.
    • Mark u black.
    • Return false.
  4. If the outer loop completes without detecting a cycle → no cycle exists.

Decision point. If v is black, the edge leads to a fully processed vertex — this is a forward or cross edge, not a back edge, so it does not indicate a cycle. Only gray triggers the alarm.


Undirected graph: DFS with parent tracking

  1. Initialize a visited array of size V, all false.
  2. For each unvisited vertex, call dfs(vertex, -1) (no parent initially).
  3. In dfs(u, parent):
    • Mark u visited.
    • For each neighbor v of u:
      • If v == parent → skip (this is the edge we came from).
      • If v is visited → cycle detected, return true.
      • Otherwise → recurse: dfs(v, u). If it returns true, propagate true.
  4. Return false if no cycle found.

Undirected graph: Union-Find

  1. Initialize parent array where parent[i] = i for all vertices.
  2. For each edge (u, v):
    • Find root_u = find(u) and root_v = find(v).
    • If root_u == root_vcycle detected, stop.
    • Otherwise, union(root_u, root_v).
  3. If all edges are processed without a match → no cycle exists.

Optimization. Apply path compression in find and union by rank to keep operations near O(1) amortized per edge.


Linked list or sequential structure: Floyd's algorithm

  1. Initialize slow = head and fast = head.
  2. While fast and fast.next are not null:
    • Advance slow by one: slow = slow.next.
    • Advance fast by two: fast = fast.next.next.
    • If slow == fastcycle detected, stop.
  3. If the loop exits → no cycle (the hare hit the end).

To find the cycle entry (optional): 4. Reset slow = head. Keep fast at the meeting point. 5. Advance both one step at a time until slow == fast. 6. That node is the cycle's entry point.


Worked Example

Scenario. A build system has five tasks: A, B, C, D, E. The dependency edges are: A → B, B → C, C → D, D → B.

The last edge, D → B, points back to B which is already an ancestor in the DFS path when D is being processed.

Fig 1
A B C D E back edge (cycle!) black (done) gray (in stack) white (unvisited)
DFS traversal order with coloring. The edge D → B lands on a gray node, signaling a cycle.

Step-by-step trace:

  1. Start DFS from A. Color A gray.
  2. Visit B. Color B gray.
  3. Visit C. Color C gray.
  4. Visit D. Color D gray.
  5. Examine edge D → B. B is gray — it is still on the active call stack. This is a back edge. Cycle detected. The cycle is B → C → D → B.

E is unreachable from A and has no cycle of its own. It would be visited in the outer loop and colored black without incident.


Compare & Contrast

TechniqueGraph typeRepresentation requiredTimeSpaceBest used when
DFS + coloringDirectedAdjacency list / matrixO(V + E)O(V)Full graph available, directed
DFS + parent trackingUndirectedAdjacency list / matrixO(V + E)O(V)Full graph available, undirected
Union-FindUndirectedEdge list (stream)~O(E · α(V))O(V)Edges arrive incrementally
Floyd's tortoise & hareImplicit sequencesSingle next-pointerO(n)O(1)Linked lists, no explicit edge list

DFS coloring vs. DFS parent tracking. Both are DFS, but they solve fundamentally different problems. The coloring method exploits the fact that directed edges have a defined "pointing direction," so a gray hit is unambiguous. The parent-tracking method must explicitly suppress the traversal of the edge that was just used to arrive at a node, because undirected edges are bidirectional and would otherwise look like immediate back edges.

DFS vs. Union-Find for undirected graphs. Both run in near-linear time. Union-Find wins when edges arrive as a stream and you need an online decision at each step. DFS wins when you have the full graph in memory and also want to trace which vertices form the cycle.

Union-Find vs. Floyd's. These solve different problems and are not interchangeable. Union-Find works on explicit graphs with a vertex and edge set. Floyd's works on implicit single-pointer structures where the concept of a vertex set does not apply.


Common Misconceptions

"A cycle in an undirected graph is just two nodes connected by two edges." No. An undirected edge (u, v) is a single bidirectional connection, not two directed edges. A cycle in an undirected graph requires at least three vertices. Treating the reverse traversal of a single edge as a "second edge" is the error that makes parent tracking necessary.

"Detecting a gray node means detecting the cycle we just created." A gray node means there is some cycle, but the cycle might not involve the edge currently being examined directly. The cycle consists of the path from the gray ancestor down to the current node, plus the back edge pointing back up.

"Union-Find works on directed graphs." Union-Find as described here is for undirected graphs only. Directed graphs require DFS-based methods that respect edge direction. Applying Union-Find to directed edges will produce false positives: a directed edge A → B and another B → C would merge all three into one component, but that does not mean A → B → C is a cycle.

"Floyd's algorithm works on any graph." Floyd's tortoise-and-hare algorithm is designed for implicit sequential structures accessed through a single next pointer — primarily linked lists. It cannot be applied directly to a general graph without first transforming the traversal into a sequential form.

"A black node hit during DFS also indicates a cycle." It does not. A black node is fully processed. Reaching it again via a forward or cross edge is perfectly normal in a DAG. Only a gray node hit is the cycle indicator.


Boundary Conditions

Disconnected graphs. All DFS-based methods must iterate over every vertex and launch a new DFS from any unvisited vertex. A graph can have a cycle in one connected component and none in another. The outer loop is not optional.

Self-loops. A vertex with an edge to itself is trivially a cycle. In the DFS coloring scheme a self-loop (u, u) is caught correctly: when processing u's neighbors, u itself is gray, so the back-edge check fires. In parent-tracking for undirected graphs, a self-loop is (u, u) and the parent exclusion v != parent does not exclude u itself, so it is also caught correctly.

Floyd's algorithm requires a finite cycle to terminate. If the structure has no cycle, the hare reaches null and the algorithm terminates. If there is an infinite list without a cycle (theoretically), the algorithm loops forever. The algorithm is correct for finite structures with or without a cycle.

Union-Find and multi-edges. In an undirected graph, a multi-edge (two distinct edges between the same pair of vertices) is a cycle of length 2. Union-Find catches it correctly: after the first edge, u and v are in the same set, so the second edge triggers the cycle condition.

Incremental detection at scale. Running a full DFS after every edge insertion is O(V + E) per insertion, which becomes quadratic in the number of edges. This is the regime where incremental algorithms that maintain topological order and amortize cycle checks across insertions become important. For small or infrequently updated graphs, the simpler full-DFS approach is fine.

Key Takeaways

  1. Back edge = cycle. In a directed graph, a back edge — an edge from a vertex to a gray ancestor in the DFS tree — is the necessary and sufficient condition for a cycle. No back edge means no cycle.
  2. Coloring distinguishes state. The white/gray/black scheme gives each vertex a three-state status. Only a gray encounter signals a cycle; a black encounter is normal and does not.
  3. Directed and undirected logic differ. Undirected DFS must suppress the reverse traversal of the edge it arrived on (parent tracking), or every edge looks like a cycle. Never conflate the two approaches.
  4. Union-Find is incremental. When edges arrive as a stream and you need a cycle check per insertion, Union-Find is the natural fit — it does not require the full graph to be loaded first.
  5. Floyd's tortoise and hare is the linked-list specialist. For implicit single-pointer structures, it provides O(n) time and O(1) space — and can also locate the cycle entry point in a second pass at no additional asymptotic cost.

Further Exploration

Foundational Lectures

Research & Advanced Topics

Algorithms Textbooks & Practical Guides