Control-Flow Graph Construction

Problem definition

A control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution.

In a control-flow graph each node in the graph represents a basic block, i.e. a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block.

Directed edges are used to represent jumps in the control flow. There are, in most presentations, two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.

Building a CFG is purely a syntactic process, we split code into basic blocks and link them based on the branch or jump instruction we see.

This makes it decidable (meaning we can achieve a perfectly accurate CFG) because:

  1. We only look at each instruction once:
    • Scan the code top to bottom to identify basic block boundaries (leaders) and terminators (br, switch, return, etc.).
    • No recursive or infinite reasoning—every jump target is explicitly named in the text.
  2. Edges come directly from the code:
    • For each block terminator, we determine its type (true branch, false branch, exit, etc.) and link blocks accordingly.

Plain text algorithm

We perform these phases in one pass over all basic blocks:

  1. Identify blocks:
    • Scan the program from top to bottom, marking any instruction that is the first in the function, a target of a branch, or immediately after a terminator as a leader.
    • Partition the instructions into BasicBlock BLOCK nodes so each BLOCK has one entry (its first instruction) and one exit (its terminator instruction).
  2. Initialize:
    • For each BasicBlock BLOCK, set BLOCK.predecessors = [] and BLOCK.successors = [].
    • Set ENTRY_BLOCK to the first BasicBlock in the function.
    • Optionally, create a dummy exit block named EXIT_BLOCK with empty lists.
  3. Link edges:
    • For each BasicBlock BLOCK, let TERMINATOR be the last instruction in BLOCK.
    • Unconditional branch: if TERMINATOR is br TARGET, where TARGET is the label of the jump target,
      • Add TARGET to BLOCK.successors.
      • Add BLOCK to TARGET.predecessors.
    • Conditional branch: if TERMINATOR is br i1 CONDITION, TRUE_TARGET, FALSE_TARGET,
      • CONDITION is the boolean value being tested.
      • TRUE_TARGET is the label when CONDITION is true.
      • FALSE_TARGET is the label when CONDITION is false.
      • Add TRUE_TARGET and FALSE_TARGET to BLOCK.successors, and add BLOCK to both TRUE_TARGET.predecessors and FALSE_TARGET.predecessors.
    • Return: if TERMINATOR is return,
      • If EXIT_BLOCK exists, add EXIT_BLOCK to BLOCK.successors and add BLOCK to EXIT_BLOCK.predecessors.
      • Otherwise, leave BLOCK.successors empty so that BLOCK is treated as an exit block.
  4. Complete:
    • ENTRY_BLOCK is the CFG’s start.
    • Any BasicBlock whose successors list is empty is an exit block (or it is EXIT_BLOCK if created).

Pseudo-code

for each BasicBlock BLOCK in the function:
  // Initialization phase
  BLOCK.predecessors = []
  BLOCK.successors   = []
if using a dummy exit:
  create BasicBlock EXIT_BLOCK
  EXIT_BLOCK.predecessors = []
  EXIT_BLOCK.successors   = []
ENTRY_BLOCK = the first BasicBlock

for each BasicBlock BLOCK:
  // Determine the control flow edges from the terminator
  let TERMINATOR = BLOCK.terminator
  if TERMINATOR is "br TARGET":
    // Unconditional branch to TARGET
    BLOCK.successors.push(TARGET)
    TARGET.predecessors.push(BLOCK)
  else if TERMINATOR is "br i1 CONDITION, TRUE_TARGET, FALSE_TARGET":
    // Conditional branch on CONDITION
    // TRUE_TARGET when CONDITION is true
    // FALSE_TARGET when CONDITION is false
    BLOCK.successors.push(TRUE_TARGET)
    TRUE_TARGET.predecessors.push(BLOCK)
    BLOCK.successors.push(FALSE_TARGET)
    FALSE_TARGET.predecessors.push(BLOCK)
  else if TERMINATOR is "return":
    // Return instruction
    if EXIT_BLOCK exists:
      BLOCK.successors.push(EXIT_BLOCK)
      EXIT_BLOCK.predecessors.push(BLOCK)
    // else, BLOCK.successors remains empty

Mathematical Formulation

$$\mathrm{CFG} = (B, E, \mathit{entry}, \mathit{exit})$$ $$E = \bigl\{(b,b') \mid b,b'\in B,\; b'\text{ is a target of }b\text{’s terminator}\bigr\}$$

Reading: E is the collection of all ordered pairs (b,b′)(b,b′) where control can go directly from block bb to block b′b′ (because bb’s terminator says “jump to b′b′”).

Notation: