Engineering

Graph Representations

Choosing the right data structure before writing a single line of algorithm code

Learning Objectives

By the end of this module you will be able to:

  • Describe the adjacency matrix and adjacency list representations and sketch each for a small, concrete graph.
  • State the space complexity of each major representation in terms of V (vertices) and E (edges).
  • Compare edge-lookup and neighbor-scan time complexity across representations.
  • Explain why adjacency lists are generally preferred for sparse graphs.
  • Identify scenarios where an adjacency matrix or CSR format is more appropriate.
  • Describe the edge-list representation and state when it is convenient.
  • Articulate the tradeoffs introduced by hash-table-backed adjacency structures.

Core Concepts

Before any graph algorithm can run, the graph must be stored in memory. The choice of data structure shapes everything that follows: how much memory the program consumes, how fast edge lookups execute, and how cleanly neighbor enumeration fits into a traversal loop. There is no single best representation — only better fits for a given graph's density and the operations your algorithm prioritises.

Adjacency Matrix

An adjacency matrix is a |V| × |V| two-dimensional array. Cell [u][v] holds 1 (or an edge weight) if an edge exists from u to v, and 0 otherwise. For an undirected graph the matrix is symmetric; for a directed graph each direction is stored independently.

The defining characteristic is its O(V²) space regardless of how many edges actually exist. A 10,000-node graph allocates 100 million cells whether it has 15,000 edges or 99 million. That memory is wasted whenever edges are absent — a real cost for sparse graphs. The payoff is O(1) edge lookup: checking whether an edge exists between u and v is a single array access. This makes the matrix attractive for algorithms that perform repeated arbitrary edge queries, such as all-pairs shortest path. The cost of neighbor enumeration is equally fixed: scanning the row for vertex u always takes O(V) time, even if the vertex has only two neighbors.

When does O(V²) space become acceptable?

For dense graphs where edge count approaches V², the matrix wastes little space — most cells are occupied. The density crossover point is roughly d > 1/64: above that fraction of possible edges, a matrix becomes competitive on space and retains its O(1) lookup advantage.

Adjacency List

An adjacency list represents the graph as an array (or map) of V entries, where each entry holds the list of neighbours for that vertex. The total space consumed is O(V + E) — proportional only to what actually exists.

The operational profile is the inverse of the matrix:

Most graphs that engineers encounter in practice — social networks, dependency graphs, road networks, call graphs — are sparse. Adjacency lists fit them naturally.

CLRS Chapter 22 and Khan Academy's representation guide both establish adjacency lists as the standard choice for sparse graphs.

Edge List

An edge list is the simplest possible representation: a flat collection of pairs (u, v) — one entry per edge. Space is O(E), and the structure is trivial to build.

The weakness is that finding all neighbours of a vertex requires scanning the entire list, with no structural shortcut. Edge lists shine in two narrow contexts: algorithms that iterate over every edge exactly once (Kruskal's minimum spanning tree is the canonical example) and as an interchange format that can be ingested and converted into a more powerful structure.

Compressed Sparse Row (CSR)

CSR is a compact, array-only encoding of the adjacency list idea. Instead of separate linked lists, it packs all neighbour data into three flat arrays: one for values, one for column indices, and one for row pointers. The result is better memory locality and cache efficiency than pointer-based adjacency lists, which matters when processing very large static graphs.

The fundamental constraint is immutability: inserting or removing an edge typically requires rebuilding the entire structure. CSR is therefore a specialist tool for static graph workloads — graph analytics, scientific simulation, ML pipelines — where the topology is fixed and traversal performance is the bottleneck.

Hash-Table-Backed Adjacency

A variant of the adjacency list replaces each neighbour list with a hash set. This recovers O(1) edge lookup (matching the matrix) while retaining O(V + E) space. The catch is overhead: a hash table's internal array has unused slots, so the constant factor on memory is larger than a compact linked list. It also adds implementation complexity and per-operation overhead relative to array-based lists.

This is a practical optimisation for cases where frequent edge-existence checks are required and the graph is too sparse for a matrix. Outside that niche, standard adjacency lists remain preferred.

Compare & Contrast

The table below captures the five representations side by side on the three axes that matter most for algorithm design.

Fig 1
Representation Space Edge Lookup Neighbour Scan Dynamic? Adjacency Matrix O(V²) O(1) O(V) Yes Adjacency List O(V + E) O(d) O(d) Yes Edge List O(E) O(E) O(E) Yes CSR O(V + E) O(log d)* O(d) No Hash-Table Adjacency O(V + E) + overhead O(1) avg O(d) Yes * CSR edge lookup via binary search on sorted neighbour array; d = vertex degree
Space and time complexity across graph representations

Key distinctions to internalise:

  • The matrix trades space for lookup speed. It is the only structure that delivers O(1) edge lookup without a hash table.
  • Adjacency lists and CSR share the same O(V + E) space profile, but CSR packs data into cache-friendly arrays at the cost of immutability.
  • Edge lists are the simplest structure but the worst general-purpose one. They make sense when your algorithm needs only to iterate edges, not query them.
  • Hash-table adjacency is a hybrid that closes the lookup gap between lists and matrices, but introduces constant-factor overhead that standard lists avoid.

Worked Example

Consider a small directed graph of five vertices representing dependencies between build targets:

A → B
A → C
B → D
C → D
D → E

As an adjacency matrix (5 × 5, rows = source, columns = destination):

     A  B  C  D  E
A  [ 0  1  1  0  0 ]
B  [ 0  0  0  1  0 ]
C  [ 0  0  0  1  0 ]
D  [ 0  0  0  0  1 ]
E  [ 0  0  0  0  0 ]

Checking whether B → C exists: read matrix[B][C] = 0. One operation. Listing all dependencies of A: scan row A, finding indices 1 and 2 — but you read five cells to discover two edges.

As an adjacency list:

A: [B, C]
B: [D]
C: [D]
D: [E]
E: []

Checking whether B → C exists: scan [D], find no match. One comparison. Listing all dependencies of A: read [B, C] directly — two reads for two edges, no wasted work.

As an edge list:

[(A,B), (A,C), (B,D), (C,D), (D,E)]

Listing all dependencies of A: scan all five pairs looking for A as the source — three unnecessary comparisons for a vertex with two edges. For this query pattern, the edge list is the weakest choice.

Boundary Conditions

Understanding when a representation breaks down is as important as knowing when it works.

Adjacency matrix breaks down when:

  • The graph is sparse (most cells remain zero, wasting O(V²) memory).
  • V is large — a graph with 1 million vertices requires a matrix with 10¹² cells, which is not feasible in memory.
  • The algorithm iterates over neighbours frequently; O(V) per neighbour scan dominates runtime.

Adjacency list breaks down when:

  • The algorithm requires repeated arbitrary edge lookups. Each lookup is O(d); in high-degree graphs this becomes costly.
  • The graph is dense — the list approach offers no space advantage when nearly all edges exist, yet loses the O(1) lookup of the matrix.

Edge list breaks down when:

  • Any operation other than "iterate all edges" is needed. Neighbour queries and edge lookups are both O(E), making it the worst structure for traversal-heavy algorithms.

CSR breaks down when:

Hash-table adjacency breaks down when:

  • Memory is constrained. Hash tables carry overhead from unused internal slots. For large sparse graphs with tight memory budgets, the constant-factor cost matters.
  • The extra complexity is not justified. Standard adjacency lists are more common in practice and sufficient for most algorithms that do not require frequent edge-existence checks.
The density threshold is not a sharp line

The rule of thumb that edge density d > 1/64 favours a matrix is a useful heuristic, not a guarantee. Real decisions also depend on access patterns, hardware cache behaviour, and whether the graph changes over time. Use the threshold to orient your thinking, then benchmark if performance is critical.

Key Takeaways

  1. Adjacency matrix = O(V²) space, O(1) edge lookup, O(V) neighbour scan. Best for dense graphs or algorithms that need fast arbitrary edge queries. Impractical for large sparse graphs.
  2. Adjacency list = O(V + E) space, O(d) edge lookup, O(d) neighbour scan. The default choice for sparse graphs and traversal-heavy algorithms such as BFS and DFS.
  3. Edge list = O(E) space, O(E) for any query. Only practical for algorithms that iterate every edge once (e.g., Kruskal's) or as an input format to be converted.
  4. CSR trades mutability for cache efficiency. It matches the adjacency list's O(V + E) space while improving traversal speed on large static graphs. It cannot be updated without a full rebuild.
  5. Hash-table adjacency is a targeted optimisation. It recovers O(1) edge lookup on sparse graphs but introduces memory overhead and complexity. Reach for it only when edge-existence queries are a bottleneck.

Further Exploration

Core References

Learning Resources

Specialized Topics