Connectivity and Components
Understanding reachability, connected components, and the structure of directed graphs
Learning Objectives
By the end of this module you will be able to:
- Define graph connectivity and distinguish between connected and disconnected graphs.
- Explain the difference between weak and strong connectivity in directed graphs.
- Describe how BFS or DFS can identify connected components in an undirected graph.
- State what a strongly connected component (SCC) is and identify SCCs in a small directed graph.
- Describe at a high level how DFS-based SCC algorithms (Kosaraju and Tarjan) work.
- Explain why reachability in directed graphs is asymmetric compared to undirected graphs.
Core Concepts
Paths and Reachability
Before asking "is this graph connected?", you need a precise definition of what it means for two vertices to be reachable from one another.
In graph theory, a path is a sequence of vertices where each consecutive pair is connected by an edge. A simple path contains no repeated vertices. A walk is the more permissive variant: vertices may be revisited. The length of a path is the number of edges it contains, not the number of vertices.
Paths are the atomic unit of connectivity reasoning. When you ask "can service A reach service B in this dependency graph?", you are asking whether a path exists between two vertices. BFS and DFS are the standard tools for answering that question.
Graph Connectivity
Connectivity in graph theory addresses the minimum number of vertices or edges that must be removed to disconnect a graph into isolated subgraphs. Two measures formalize this:
- Vertex connectivity (k): the graph remains connected whenever fewer than k vertices are removed. A graph is k-vertex-connected if you must remove at least k vertices to disconnect it.
- Edge connectivity (k): the size of the smallest cut — the minimum set of edges whose removal disconnects the graph.
These properties characterize how robust a graph structure is against the removal of individual elements — a question that maps directly onto concerns like network fault tolerance or service resilience.
For practical purposes in software engineering, the first thing to establish is simply whether a graph is connected at all, before reasoning about degrees of robustness.
Connected Components in Undirected Graphs
An undirected graph is connected if there is a path between every pair of vertices. If no such universal path exists, the graph is disconnected, and it decomposes into two or more connected components — each being a maximal connected subgraph.
Paths form the basis for algorithms that determine reachability between vertices, such as BFS and DFS. Concretely, to find all connected components of an undirected graph:
- Mark all vertices as unvisited.
- Pick any unvisited vertex. Run BFS or DFS from it, marking all reachable vertices as belonging to the same component.
- If unvisited vertices remain, pick one and repeat from step 2 — each new traversal starts a new component.
The result is a partition of all vertices into components. This runs in O(V + E) time.
Strong Connectivity in Directed Graphs
In a directed graph, edges have direction, so reachability is no longer symmetric. The fact that you can reach B from A does not mean you can reach A from B. This breaks the single notion of "connected" into two:
- Weakly connected: the graph is connected if you ignore edge directions (i.e., treat all edges as undirected).
- Strongly connected: a directed graph is strongly connected if there exists a directed path from every vertex x to every other vertex y — and vice versa, from y to x — for all pairs of vertices.
Strong connectivity means mutual reachability: following the arrows, you can get from anywhere to anywhere.
Strongly Connected Components (SCCs)
Most real directed graphs are not globally strongly connected. Instead, they contain strongly connected components: maximal subgraphs within which every vertex can reach every other vertex by following directed edges. Vertices that cannot reach each other — or can only be reached in one direction — belong to different SCCs.
Once you compute the SCCs of a directed graph and condense each SCC into a single node, the resulting graph is a directed acyclic graph (DAG). This condensation is a powerful structural insight: it separates the cyclic (mutually reachable) parts of your graph from the strictly ordered parts.
DFS-Based SCC Algorithms
Depth-first search is the foundation for computing SCCs in directed graphs. Kosaraju's algorithm and Tarjan's algorithm both use DFS and its edge classification properties to identify SCCs in O(V + E) time.
Kosaraju's algorithm uses two DFS passes:
- Run DFS on the original graph; record vertices in order of their finish time (push onto a stack as each vertex completes).
- Transpose the graph (reverse all edge directions).
- Run DFS on the transposed graph, processing vertices in reverse finish-time order (popping from the stack). Each DFS tree in this pass is one SCC.
The intuition: finish-time ordering in the first pass identifies "leaders" of SCCs. Reversing the graph prevents DFS from crossing SCC boundaries in the second pass.
Tarjan's algorithm uses a single DFS pass. It maintains a stack and assigns each vertex a discovery index and a low-link value (the smallest index reachable via the subtree rooted at that vertex). When a vertex's low-link equals its discovery index, it is the root of an SCC, and the stack above it forms that component.
Both algorithms run in linear time — O(V + E) — making them practical even on large graphs.
Compare & Contrast
Connectivity Semantics: Directed vs. Undirected
Cycle detection algorithms differ fundamentally between directed and undirected graphs because edge direction affects ancestor relationships in the DFS tree and consequently what constitutes a cyclic edge. The same asymmetry applies to connectivity more broadly.
| Property | Undirected Graph | Directed Graph |
|---|---|---|
| Reachability | Symmetric: if A can reach B, B can reach A | Asymmetric: A reaching B does not imply B can reach A |
| Connectivity notion | Connected / disconnected | Weakly connected / strongly connected |
| Component type | Connected component | Strongly connected component (SCC) |
| Finding components | Single BFS/DFS pass | Two-pass DFS (Kosaraju) or single-pass with low-links (Tarjan) |
| Back edge in DFS | Any edge to a previously visited vertex except the immediate parent | Any edge to an ancestor in the current DFS stack (indicates a cycle) |
| Component structure | Partition into connected components | Condensation into a DAG of SCCs |
In an undirected graph, a "back edge" during DFS is conventionally an edge to any previously visited vertex except the immediate parent. In a directed graph, a back edge specifically points to an ancestor in the current DFS call stack. Confusing these two definitions leads to false cycle detections. Parent-tracking is necessary in the undirected case to prevent false positives.
Weak vs. Strong Connectivity
| Weakly Connected | Strongly Connected | |
|---|---|---|
| Definition | Connected when edge directions are ignored | Every vertex can reach every other vertex following directed edges |
| Implies cycles? | No | Not necessarily (a single vertex is trivially strongly connected) |
| Practical use | Checking overall graph structure; detecting isolated nodes | Dependency cycles, mutual reachability, SCC decomposition |
| Finding | Treat as undirected, run BFS/DFS | Kosaraju or Tarjan |
Worked Example
Identifying SCCs by Hand
Consider this directed graph with 6 vertices:
A → B → C → A
↓
D → E → F
↑
D
More precisely, the edges are: A→B, B→C, C→A, C→D, D→E, E→D, E→F.
Step 1: Identify cycles.
- A→B→C→A forms a cycle. So {A, B, C} are mutually reachable.
- D→E→D forms a cycle. So {D, E} are mutually reachable.
- F has no outgoing edges back to D or E.
Step 2: Check cross-SCC reachability.
- C→D means {A,B,C} can reach {D,E}, but {D,E} cannot reach {A,B,C}.
- E→F means {D,E} can reach {F}, but F cannot reach {D,E}.
SCCs:
- SCC 1: {A, B, C}
- SCC 2: {D, E}
- SCC 3: {F}
Condensed DAG: SCC1 → SCC2 → SCC3. No cycles — it is a DAG as expected.
Annotated Case Study
Dependency Graphs and the Cost of Cycles
Consider a build system — Gradle, Bazel, or similar — where modules declare dependencies on one another. The build system models this as a directed graph: an edge A→B means "A depends on B" (A must be built after B).
The happy path: a DAG. If the dependency graph is a DAG, the build system can topologically sort it and build modules in the correct order. Each module's SCC is just itself — no module depends on itself transitively.
What happens when a cycle appears. Suppose module A depends on B, B depends on C, and C depends back on A. Now {A, B, C} form an SCC. The build system cannot order them: A must wait for B, B must wait for C, C must wait for A. Most build tools either reject this as an error or require the entire SCC to be compiled together as a single unit.
Why SCC decomposition is the right tool. Running Kosaraju or Tarjan on the dependency graph in O(V + E) time immediately reveals which modules are entangled in cycles. The condensed DAG shows the true build order at the SCC level. SCCs with more than one node flag a problem; they tell you exactly which modules are circularly coupled and need to be refactored.
The directed asymmetry matters here. A depends on B does not mean B depends on A. Reachability in a directed graph is asymmetric. If you mistakenly modelled this as an undirected graph, every pair of connected modules would appear "mutually dependent", making the graph seem far more entangled than it is.
Circular imports in Python packages, circular bean dependencies in Spring, and circular module references in TypeScript are all instances of this same structural problem. The underlying graph question is always: does this directed graph contain an SCC with more than one node?
Key Takeaways
- A path is the unit of reachability. Connectivity is defined by the existence of paths — sequences of vertices connected by edges — and DFS/BFS are the standard mechanisms for finding them.
- Undirected connectivity is binary and symmetric. A graph is either connected (path between all pairs) or it decomposes into connected components. Either way, reachability goes both ways.
- Directed graphs break symmetry. Reaching B from A does not mean you can reach A from B. This forces a distinction between weak connectivity (ignoring directions) and strong connectivity (mutual reachability following directions).
- SCCs are the natural unit of structure in directed graphs. A strongly connected component is a maximal set of vertices with mutual reachability. Every directed graph decomposes into SCCs, and the condensation of SCCs is always a DAG.
- Kosaraju and Tarjan find SCCs in linear time. Both use DFS — Kosaraju with two passes on the original and transposed graph, Tarjan with one pass using low-link values. The linear-time guarantee makes SCC decomposition practical on large graphs.
Further Exploration
Wikipedia References
- Connectivity (graph theory) — Formal definitions of vertex and edge connectivity with accessible examples.
- Graph (discrete mathematics) — Covers strong and weak connectivity in directed graphs.
- Cycle (graph theory) — Context for how cycles relate to connectivity and SCCs.
Academic Lectures
- CMU 15-210: Chapter 11 — Depth-First Search — Covers DFS edge classification and the basis for SCC algorithms.
- University of Toronto: DFS Lecture Notes — Walks through DFS finish times and SCC identification.
- Emory CS171: Directed and Edge-Weighted Graphs — A concise course page on directed graph properties and strong connectivity.
Textbooks
- Whitman College: Introduction to Graph Theory — Rigorous definitions of paths, walks, and connectivity for those who want the formal grounding.