Cycle Detection
Four techniques for catching loops in directed graphs, undirected graphs, and linked structures
Learning Objectives
By the end of this module you will be able to:
- Explain why a back edge in DFS signals a cycle in a directed graph.
- Apply the DFS white/gray/black coloring method to detect cycles in a directed graph.
- Describe how parent tracking makes DFS cycle detection correct for undirected graphs.
- Explain how Union-Find detects cycles in undirected graphs and when it is preferred over DFS.
- Describe Floyd's tortoise-and-hare algorithm and the graph structures it is designed for.
- State the time and space complexity of each detection technique.
- Identify real engineering scenarios where cycle detection is required.
Core Concepts
Why cycles matter
A cycle in a graph means a path exists that loops back to its own starting point. This matters enormously in engineering because many problems reduce to establishing an ordering of nodes — and an ordering is only possible when no cycles exist. Cycle detection is a prerequisite for topological sort, dependency resolution, and deadlock diagnosis.
Concrete scenarios where cycle detection is required:
- Build systems and package managers — circular dependencies between packages or build targets cannot be resolved.
- Database migrations — migration scripts with mutual dependencies cannot be run in any safe order.
- Circuit evaluation and compilation — feedback loops in a dependency graph indicate a problematic structural condition.
- Deadlock detection — a cycle in a resource-allocation graph means a set of processes is permanently blocked.
- Course prerequisite systems — a cycle means students can never satisfy the requirements to enroll.
In every case the cycle must be detected and surfaced to the user, because the underlying ordering problem becomes infeasible the moment a cycle exists.
The back edge: the key concept in DFS-based detection
When DFS traverses a directed graph it builds a DFS tree — the spanning subgraph formed by the edges it actually follows. Edges in the original graph that are not in the tree fall into categories. The most important is the back edge.
A back edge is an edge that points from a vertex to one of its ancestors in the DFS tree.
A back edge is the necessary and sufficient condition for a cycle in a directed graph. If any back edge exists, a cycle exists. If no back edge exists, the graph is a DAG.
The intuition: if you are in the middle of recursing into vertex u and you find an edge that points back to an ancestor you are still visiting, you have just traced a closed path — a cycle.
DFS coloring for directed graphs
The canonical way to track what "still being visited" means during DFS is the white/gray/black color scheme.
| Color | Meaning |
|---|---|
| White (0) | Not yet visited |
| Gray (1) | Currently in the recursion stack |
| Black (2) | Fully processed |
A cycle exists if and only if DFS encounters a gray node during traversal. A gray node is one that is already on the active call stack, so reaching it again means there is a path from it to itself.
White means unstarted. Gray means started but not finished — the vertex is an ancestor in the current DFS path. Black means finished. Only a gray encounter indicates a back edge.
DFS with parent tracking for undirected graphs
Undirected graphs require a different treatment. Because every undirected edge (u, v) can be traversed in both directions, a naive DFS would immediately see u as a "visited ancestor" when backtracking from v, producing a false positive on every single edge.
The fix is parent tracking. When DFS visits vertex v from vertex u, it records u as the parent. If during the traversal from v it encounters an already-visited neighbor that is not u, that neighbor is reachable via a second path — which proves a cycle. The parent edge itself is deliberately excluded from the check.
Directed and undirected cycle detection use different logic. Applying the directed (coloring) method to an undirected graph without modification will produce false positives on every traversed edge.
Union-Find for undirected graphs
Union-Find (also called Disjoint-Set Union) offers an alternative to DFS for undirected graphs. The approach works as follows:
- Initially, each vertex is its own component (its own parent).
- For each edge
(u, v), find the root representative of the set containinguand the root of the set containingv. - If both roots are the same,
uandvare already connected — adding this edge would create a cycle. - If the roots differ, merge the two sets (union them) and continue.
The key advantage is that Union-Find naturally handles an incremental stream of edges. You do not need the entire graph in memory; you can check each edge as it arrives. This makes it the preferred approach when edges are inserted one at a time and you need a cycle check after each insertion.
Floyd's tortoise and hare for implicit graphs
DFS and Union-Find both require an explicit graph representation — an adjacency list or edge list. Some structures do not expose themselves that way. A singly linked list, for example, gives you only a next pointer from each node. You cannot enumerate its edges in advance.
Floyd's cycle-finding algorithm handles exactly this case. It uses two pointers:
- Tortoise — advances one node per step.
- Hare — advances two nodes per step.
If a cycle exists, the hare will eventually lap the tortoise and the two pointers will meet at the same node. If the hare reaches a null (end of the list), no cycle exists.
The algorithm runs in O(n) time and O(1) space — it uses only two pointer variables regardless of how large the structure is. This is what makes it the right tool for linked lists and similar implicit sequential structures.
Locating the cycle's entry point. Floyd's algorithm can be extended to find where the cycle begins. Once the two pointers meet, reset one pointer to the head. Advance both pointers one step at a time. The node where they next meet is the entry point of the cycle. This extension preserves the O(n) time and O(1) space bounds.
Incremental cycle detection
In some production systems, the graph is not static — edges are inserted over time and you need to know at the moment of each insertion whether a cycle has been introduced. This is the incremental cycle detection problem.
Research has established algorithms that maintain a topological ordering and detect cycles as edges are added, achieving bounds such as O(m^(3/2)) total time for m edge insertions. Applications include circuit evaluation, compilation dependency tracking, and deadlock detection in live systems. Running a full DFS after every edge insertion would be correct but expensive; incremental algorithms amortize the cost across insertions.
Step-by-Step Procedure
Directed graph: DFS with coloring
- Initialize a color array of size V, all set to white (0).
- For each vertex that is still white, call
dfs(vertex). - In
dfs(u):- Mark
ugray. - For each neighbor
vofu:- If
vis gray → cycle detected, return true. - If
vis white → recurse:dfs(v). If it returns true, propagate true upward.
- If
- Mark
ublack. - Return false.
- Mark
- If the outer loop completes without detecting a cycle → no cycle exists.
Decision point. If v is black, the edge leads to a fully processed vertex — this is a forward or cross edge, not a back edge, so it does not indicate a cycle. Only gray triggers the alarm.
Undirected graph: DFS with parent tracking
- Initialize a visited array of size V, all false.
- For each unvisited vertex, call
dfs(vertex, -1)(no parent initially). - In
dfs(u, parent):- Mark
uvisited. - For each neighbor
vofu:- If
v == parent→ skip (this is the edge we came from). - If
vis visited → cycle detected, return true. - Otherwise → recurse:
dfs(v, u). If it returns true, propagate true.
- If
- Mark
- Return false if no cycle found.
Undirected graph: Union-Find
- Initialize parent array where
parent[i] = ifor all vertices. - For each edge
(u, v):- Find
root_u = find(u)androot_v = find(v). - If
root_u == root_v→ cycle detected, stop. - Otherwise,
union(root_u, root_v).
- Find
- If all edges are processed without a match → no cycle exists.
Optimization. Apply path compression in find and union by rank to keep operations near O(1) amortized per edge.
Linked list or sequential structure: Floyd's algorithm
- Initialize
slow = headandfast = head. - While
fastandfast.nextare not null:- Advance
slowby one:slow = slow.next. - Advance
fastby two:fast = fast.next.next. - If
slow == fast→ cycle detected, stop.
- Advance
- If the loop exits → no cycle (the hare hit the end).
To find the cycle entry (optional):
4. Reset slow = head. Keep fast at the meeting point.
5. Advance both one step at a time until slow == fast.
6. That node is the cycle's entry point.
Worked Example
Scenario. A build system has five tasks: A, B, C, D, E. The dependency edges are: A → B, B → C, C → D, D → B.
The last edge, D → B, points back to B which is already an ancestor in the DFS path when D is being processed.
Step-by-step trace:
- Start DFS from
A. ColorAgray. - Visit
B. ColorBgray. - Visit
C. ColorCgray. - Visit
D. ColorDgray. - Examine edge
D → B.Bis gray — it is still on the active call stack. This is a back edge. Cycle detected. The cycle isB → C → D → B.
E is unreachable from A and has no cycle of its own. It would be visited in the outer loop and colored black without incident.
Compare & Contrast
| Technique | Graph type | Representation required | Time | Space | Best used when |
|---|---|---|---|---|---|
| DFS + coloring | Directed | Adjacency list / matrix | O(V + E) | O(V) | Full graph available, directed |
| DFS + parent tracking | Undirected | Adjacency list / matrix | O(V + E) | O(V) | Full graph available, undirected |
| Union-Find | Undirected | Edge list (stream) | ~O(E · α(V)) | O(V) | Edges arrive incrementally |
| Floyd's tortoise & hare | Implicit sequences | Single next-pointer | O(n) | O(1) | Linked lists, no explicit edge list |
DFS coloring vs. DFS parent tracking. Both are DFS, but they solve fundamentally different problems. The coloring method exploits the fact that directed edges have a defined "pointing direction," so a gray hit is unambiguous. The parent-tracking method must explicitly suppress the traversal of the edge that was just used to arrive at a node, because undirected edges are bidirectional and would otherwise look like immediate back edges.
DFS vs. Union-Find for undirected graphs. Both run in near-linear time. Union-Find wins when edges arrive as a stream and you need an online decision at each step. DFS wins when you have the full graph in memory and also want to trace which vertices form the cycle.
Union-Find vs. Floyd's. These solve different problems and are not interchangeable. Union-Find works on explicit graphs with a vertex and edge set. Floyd's works on implicit single-pointer structures where the concept of a vertex set does not apply.
Common Misconceptions
"A cycle in an undirected graph is just two nodes connected by two edges." No. An undirected edge (u, v) is a single bidirectional connection, not two directed edges. A cycle in an undirected graph requires at least three vertices. Treating the reverse traversal of a single edge as a "second edge" is the error that makes parent tracking necessary.
"Detecting a gray node means detecting the cycle we just created." A gray node means there is some cycle, but the cycle might not involve the edge currently being examined directly. The cycle consists of the path from the gray ancestor down to the current node, plus the back edge pointing back up.
"Union-Find works on directed graphs." Union-Find as described here is for undirected graphs only. Directed graphs require DFS-based methods that respect edge direction. Applying Union-Find to directed edges will produce false positives: a directed edge A → B and another B → C would merge all three into one component, but that does not mean A → B → C is a cycle.
"Floyd's algorithm works on any graph." Floyd's tortoise-and-hare algorithm is designed for implicit sequential structures accessed through a single next pointer — primarily linked lists. It cannot be applied directly to a general graph without first transforming the traversal into a sequential form.
"A black node hit during DFS also indicates a cycle." It does not. A black node is fully processed. Reaching it again via a forward or cross edge is perfectly normal in a DAG. Only a gray node hit is the cycle indicator.
Boundary Conditions
Disconnected graphs. All DFS-based methods must iterate over every vertex and launch a new DFS from any unvisited vertex. A graph can have a cycle in one connected component and none in another. The outer loop is not optional.
Self-loops. A vertex with an edge to itself is trivially a cycle. In the DFS coloring scheme a self-loop (u, u) is caught correctly: when processing u's neighbors, u itself is gray, so the back-edge check fires. In parent-tracking for undirected graphs, a self-loop is (u, u) and the parent exclusion v != parent does not exclude u itself, so it is also caught correctly.
Floyd's algorithm requires a finite cycle to terminate. If the structure has no cycle, the hare reaches null and the algorithm terminates. If there is an infinite list without a cycle (theoretically), the algorithm loops forever. The algorithm is correct for finite structures with or without a cycle.
Union-Find and multi-edges. In an undirected graph, a multi-edge (two distinct edges between the same pair of vertices) is a cycle of length 2. Union-Find catches it correctly: after the first edge, u and v are in the same set, so the second edge triggers the cycle condition.
Incremental detection at scale. Running a full DFS after every edge insertion is O(V + E) per insertion, which becomes quadratic in the number of edges. This is the regime where incremental algorithms that maintain topological order and amortize cycle checks across insertions become important. For small or infrequently updated graphs, the simpler full-DFS approach is fine.
Key Takeaways
- Back edge = cycle. In a directed graph, a back edge — an edge from a vertex to a gray ancestor in the DFS tree — is the necessary and sufficient condition for a cycle. No back edge means no cycle.
- Coloring distinguishes state. The white/gray/black scheme gives each vertex a three-state status. Only a gray encounter signals a cycle; a black encounter is normal and does not.
- Directed and undirected logic differ. Undirected DFS must suppress the reverse traversal of the edge it arrived on (parent tracking), or every edge looks like a cycle. Never conflate the two approaches.
- Union-Find is incremental. When edges arrive as a stream and you need a cycle check per insertion, Union-Find is the natural fit — it does not require the full graph to be loaded first.
- Floyd's tortoise and hare is the linked-list specialist. For implicit single-pointer structures, it provides O(n) time and O(1) space — and can also locate the cycle entry point in a second pass at no additional asymptotic cost.
Further Exploration
Foundational Lectures
- MIT OpenCourseWare — Lecture 14: DFS and Topological Sort — Foundational lecture covering DFS classification of edges and the back-edge theorem
- University of Illinois CS374 — DFS and Cycle Detection — Thorough treatment of directed and undirected cases with proof sketches
Research & Advanced Topics
- Incremental Cycle Detection, Topological Ordering, and Strong Component Maintenance — Primary research paper for incremental cycle detection in dynamic graphs
- Cycle detection — Wikipedia — Survey including Floyd's algorithm, Brent's algorithm, and their relationship
Algorithms Textbooks & Practical Guides
- Princeton Algorithms 4/e — Undirected Graphs — Union-Find and graph connectivity in context of full algorithms course
- GeeksforGeeks — Detect Cycle in a Directed Graph using Colors — Practical walkthrough with code
- GeeksforGeeks — Floyd's Cycle Finding Algorithm — Implementation details and the entry-point extension