Call-Graph Construction

Problem definition

A call graph is an artifact produced by program analysis tools to record the relationships between a function and the functions it calls. Building a callgraph, like many static analyses, is known to be an undecidable problem, meaning there is no perfect solution to build a perfect static callgraph. It relies on tricks and techniques to approximate the best callgraph possible.

Numerous techniques exist to improve callgraph generation at compile time, such as Reach Analysis, Class Hierarchy Analysis, Rapid Type Analysis, etc. Most of these techniques are useful in an Object Oriented program paradigm; since this tool focuses on Rust, the solution differs a bit. (Maybe in the future support for OOP languages will be added.)

Call-graph analysis on Rust’s LLVM IR is easier because:

Monomorphization is the process where the compiler takes a generic function or type (one written to work over any type T) and, at compile time, generates a separate, concrete version for each actual type it’s used with.

We solve both in our fixed-point loop by:

Because monomorphized generic calls show up as call @foo_u32 or call @foo_String (i.e. direct getCalledFunction() hits), the algorithm does nothing special for generics: they already appear as direct edges in step 1 of each iteration.

Key insight: we don’t just scan for direct calls—we first run a flow-insensitive points-to analysis over the entire IR to conservatively resolve all function-pointer targets. That initial points-to graph seeds our indirect‑call edges and lets us iterate to a much more precise call graph.

Plain text algorithm

We repeat these steps until we add no new edges:

  1. Walk every basic block (using FI points-to info):

    Prerequisite: have initial points-to sets for all function‑pointer variables, computed by a fast flow-insensitive pass.

    • On a direct call, add an edge: caller → callee.
    • On an indirect call, look up the variable’s current FI points-to set (PFI[p]) and add a provisional edge to each potential target; remember this call-site for later refinement.
    • On any pointer operation (store, load, bitcast, GEP, φ-node, return, parameter), record that the source pointer can flow to the destination pointer.
  2. Pointer-tracking phase:
    • Seed: Initialize each pointer’s set with its FI points-to results (all v-table slots & address-taken functions).
    • Flood: Propagate those seeds along pointer‑flow edges to refine the points-to graph before each call-resolution pass.
  3. Resolve indirect calls:

    Using the refined points-to graph from Step 2…

    • For each remembered indirect call site, look up its variable’s final function set.
    • Build our graph by adding edges from the caller to each function in that set.
    • If any new edge was added, repeat from step 1.

Pseudo-code

repeat until we add no new edges

1. walk every basic block
     • direct call   → add caller → callee edge
     • indirect call → remember call-site + operand
     • pointer copy  → remember edge  src ↦ dst
       (store, load, bitcast, getelementptr, φ, return, param)

2. pointer-tracking phase
     • seed: every v-table slot + address-taken function
     • work-list flood:
         while some P(v) just grew
             for (v → w) in copy-edges  push w
             P(w) ← P(w) ∪ (new funcs)

3. resolve remembered indirect calls
     • for each call-site c
         for t in P(op(c))   add caller(c) → t
         if any edge new   repeat outer loop

Mathematical Formulation

$$P^* \;=\; \mathrm{lfp}_P\!\Bigl(v \;\mapsto\; \mathrm{Seeds}(v)\;\cup\;\bigcup_{(u,v)\in E}P(u)\Bigr)$$ $$\mathrm{CG} \;=\; \bigl\{\,(\mathrm{caller}(c),\,t)\mid c\in C,\; t\in \begin{cases} \{\mathrm{callee}(c)\} & \text{if $c$ is direct},\\[4pt] P^*(\mathrm{op}(c)) & \text{if $c$ is indirect} \end{cases}\bigr\}$$

Here, the least-fixed-point computation of P* is exactly our FI points-to pass—its results feed directly into CG to resolve indirect calls.

Notation: