Hardekopf & Lin: Sparse Flow-Sensitive Pointer Analysis

Background

Pointer analysis determines which pointers (or heap references) may refer to which memory locations. Alias analysis is the related question: whether two expressions can refer to the same location. Together these underpin many security analyses (e.g., detecting use-after-free or taint flows).

There are several dimensions of precision:

The core challenge is: doing a fully flow-sensitive, scalable, and precise analysis over large codebases is expensive. Hardekopf & Lin’s algorithm gets full flow sensitivity efficiently by combining:

High-level pipeline

In sequence:

  1. Flow-insensitive AUX pointer analysis: conservatively approximate which loads and stores might interact; resolve indirect calls; seed the interprocedural CFG.
  2. Memory/pointer SSA & partitioning: build SSA form for all pointer variables (including address-taken/heap), partition heap variables into equivalence classes, and insert φ/χ/μ to version heap updates and merges.
  3. Construct the sparse Def-Use Graph (DUG): keep only relevant operations (ALLOC, COPY, LOAD, STORE, φ/χ/μ) and connect them with labeled/unlabeled edges encoding dataflow.
  4. Flow-sensitive points-to solve: propagate precise points-to sets over the DUG until a fixpoint is reached.
  5. Optional refinement: feed tighter results back to refine call graph / SSA and rerun for marginal gains.

1. Preprocessing — Building the Sparse Def-Use Graph (DUG)

1.1 AUX pass (flow-insensitive)

1.2 Top-level SSA & partitioning

1.3 Label loads and stores

1.4 Heap-SSA on partitions

Run an SSA construction over each partition, placing φ/χ/μ nodes so that heap-related definitions and uses are versioned. Now every heap memory operation is in strict SSA form, separating “before” and “after” effects.

1.5 Construct the DUG

2. Data structures for the sparse solver

All these graphs start empty. The partition function part(v) maps an address-taken variable to its partition.

3. Sparse flow-sensitive fixpoint iteration

while Worklist ≠ ∅:
    k = removeOne(Worklist)

    switch (kind of node k):
      case ALLOC:
        let x = &o
        Δ = {o}
        PG[x] ∪= Δ
        propagate Δ along all unlabeled outgoing edges from k

      case COPY:
        let x = y z …
        Δ = PG[y] ∪ PG[z] ∪ …
        PG[x] ∪= Δ
        propagate Δ along all unlabeled outgoing edges

      case LOAD (v = μ(P)):
        IN_k[P] = ⋃_{p ∈ part(P)} PG[p]      // gather incoming top-level info
        Δ = IN_k[P]
        propagate Δ along all edges labeled with P

      case STORE (P = χ(P)):
        IN_k[P] = ⋃_{p ∈ part(P)} PG[p]
        OUT_k[P] = (IN_k[P] without old stored targets) ∪ new stores
        Δ = OUT_k[P] ∖ IN_k[P]
        propagate Δ along all edges labeled with P

      case φ (on partition P):
        IN_k[P] = union of incoming IN/OUT for P
        Δ = IN_k[P]
        propagate Δ along all unlabeled outgoing edges

    // propagation: if target's graph grows, re-add that node to Worklist

Propagation means: for each target node reachable from k via a DUG edge, merge Δ into its corresponding graph (PG or IN), and if that target gained new info, put it back on the worklist.

Why this is flow-sensitive and sparse

Flow-sensitive: The DUG edges, combined with SSA versioning, encode the causal order—uses pull from the precise definitions in effect at that program point. Updates don’t retroactively affect earlier uses because each definition created a new version.

Sparse: The algorithm never walks the full control-flow graph. It only propagates along the tiny DUG containing pointer-relevant operations, so irrelevant statements are skipped entirely.

This staging—one cheap flow-insensitive pass to bootstrap, then one precise flow-sensitive solve on a compressed graph—is what gives scalability to millions of lines of code.

Comparison / notes

Chase et al. proposed dynamically maintaining SSA during flow-sensitive pointer analysis; their idea wasn’t evaluated directly, but related approaches (e.g., Tok et al.) showed scalability limits when updating SSA dynamically. Hardekopf & Lin instead compute SSA upfront (with AUX guidance) and then solve sparsely, avoiding repeated full reformation of SSA, yielding practical performance on large programs.

Mathematical formulation (intuition)

$$P^* = \mathrm{lfp}_P\left(v \mapsto \mathrm{Seeds}(v) \cup \bigcup_{(u,v)\in E} P(u)\right)$$ $$\mathrm{Solution} = \text{fixpoint of propagation over the DUG starting from the bottom (empty) sets}$$

Interpretation: Starting from conservative seeds (from AUX), propagate along the sparse edges; because the transfer function is monotonic over the finite lattice of pointsto sets, the iteration converges to a least fixed point.