Control-Flow Graph Construction
Problem definition
A control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution.
In a control-flow graph each node in the graph represents a basic block, i.e. a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block.
Directed edges are used to represent jumps in the control flow. There are, in most presentations, two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.
Building a CFG is purely a syntactic process, we split code into basic blocks and link them based on the branch or jump instruction we see.
This makes it decidable (meaning we can achieve a perfectly accurate CFG) because:
-
We only look at each instruction once:
- Scan the code top to bottom to identify basic block boundaries (leaders) and terminators (
br,switch,return, etc.). - No recursive or infinite reasoning—every jump target is explicitly named in the text.
- Scan the code top to bottom to identify basic block boundaries (leaders) and terminators (
-
Edges come directly from the code:
- For each block terminator, we determine its type (true branch, false branch, exit, etc.) and link blocks accordingly.
Plain text algorithm
We perform these phases in one pass over all basic blocks:
-
Identify blocks:
- Scan the program from top to bottom, marking any instruction that is the first in the function, a target of a branch, or immediately after a terminator as a
leader. - Partition the instructions into
BasicBlock BLOCKnodes so eachBLOCKhas one entry (its first instruction) and one exit (its terminator instruction).
- Scan the program from top to bottom, marking any instruction that is the first in the function, a target of a branch, or immediately after a terminator as a
-
Initialize:
- For each
BasicBlock BLOCK, setBLOCK.predecessors = []andBLOCK.successors = []. - Set
ENTRY_BLOCKto the firstBasicBlockin the function. - Optionally, create a dummy exit block named
EXIT_BLOCKwith empty lists.
- For each
-
Link edges:
- For each
BasicBlock BLOCK, letTERMINATORbe the last instruction inBLOCK. - Unconditional branch: if
TERMINATORisbr TARGET, whereTARGETis the label of the jump target,- Add
TARGETtoBLOCK.successors. - Add
BLOCKtoTARGET.predecessors.
- Add
- Conditional branch: if
TERMINATORisbr i1 CONDITION, TRUE_TARGET, FALSE_TARGET,CONDITIONis the boolean value being tested.TRUE_TARGETis the label whenCONDITIONis true.FALSE_TARGETis the label whenCONDITIONis false.- Add
TRUE_TARGETandFALSE_TARGETtoBLOCK.successors, and addBLOCKto bothTRUE_TARGET.predecessorsandFALSE_TARGET.predecessors.
- Return: if
TERMINATORisreturn,- If
EXIT_BLOCKexists, addEXIT_BLOCKtoBLOCK.successorsand addBLOCKtoEXIT_BLOCK.predecessors. - Otherwise, leave
BLOCK.successorsempty so thatBLOCKis treated as an exit block.
- If
- For each
-
Complete:
ENTRY_BLOCKis the CFG’s start.- Any
BasicBlockwhosesuccessorslist is empty is an exit block (or it isEXIT_BLOCKif created).
Pseudo-code
for each BasicBlock BLOCK in the function:
// Initialization phase
BLOCK.predecessors = []
BLOCK.successors = []
if using a dummy exit:
create BasicBlock EXIT_BLOCK
EXIT_BLOCK.predecessors = []
EXIT_BLOCK.successors = []
ENTRY_BLOCK = the first BasicBlock
for each BasicBlock BLOCK:
// Determine the control flow edges from the terminator
let TERMINATOR = BLOCK.terminator
if TERMINATOR is "br TARGET":
// Unconditional branch to TARGET
BLOCK.successors.push(TARGET)
TARGET.predecessors.push(BLOCK)
else if TERMINATOR is "br i1 CONDITION, TRUE_TARGET, FALSE_TARGET":
// Conditional branch on CONDITION
// TRUE_TARGET when CONDITION is true
// FALSE_TARGET when CONDITION is false
BLOCK.successors.push(TRUE_TARGET)
TRUE_TARGET.predecessors.push(BLOCK)
BLOCK.successors.push(FALSE_TARGET)
FALSE_TARGET.predecessors.push(BLOCK)
else if TERMINATOR is "return":
// Return instruction
if EXIT_BLOCK exists:
BLOCK.successors.push(EXIT_BLOCK)
EXIT_BLOCK.predecessors.push(BLOCK)
// else, BLOCK.successors remains empty
Mathematical Formulation
Reading: E is the collection of all ordered pairs (b,b′)(b,b′) where control can go directly from block bb to block b′b′ (because bb’s terminator says “jump to b′b′”).
Notation:
- CFG — The control‑flow graph
- = — “equals” or “is defined as”
- ( , ) — An ordered 4‑tuple
- B — The set of all basic blocks in the function
- , — Separates tuple components
- E — The set of directed edges between blocks
- entry — The special block where control begins
- exit — The special block where control ends
- = — Defines E in set‑comprehension form
- { … } — A set of ordered pairs
(b, b') - (b, b') — An edge from block
bto blockb' - | (“such that”) — Introduces the conditions for membership in E
- b, b' ∈ B — Both
bandb'are basic blocks in B - , — “and” (both conditions must hold)
- b' is a jump target of b’s terminator — Block
b'appears as one of the labels in the last instruction (the terminator) of blockb