Control-Flow Graph Construction
Problem definition
A control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution.
In a control-flow graph each node in the graph represents a basic block, i.e. a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block.
Directed edges are used to represent jumps in the control flow. There are, in most presentations, two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.
Building a CFG is purely a syntactic process, we split code into basic blocks and link them based on the branch or jump instruction we see.
This makes it decidable (meaning we can achieve a perfectly accurate CFG) because:
-
We only look at each instruction once:
- Scan the code top to bottom to identify basic block boundaries (leaders) and terminators (
br
,switch
,return
, etc.). - No recursive or infinite reasoning—every jump target is explicitly named in the text.
- Scan the code top to bottom to identify basic block boundaries (leaders) and terminators (
-
Edges come directly from the code:
- For each block terminator, we determine its type (true branch, false branch, exit, etc.) and link blocks accordingly.
Plain text algorithm
We perform these phases in one pass over all basic blocks:
-
Identify blocks:
- Scan the program from top to bottom, marking any instruction that is the first in the function, a target of a branch, or immediately after a terminator as a
leader
. - Partition the instructions into
BasicBlock BLOCK
nodes so eachBLOCK
has one entry (its first instruction) and one exit (its terminator instruction).
- Scan the program from top to bottom, marking any instruction that is the first in the function, a target of a branch, or immediately after a terminator as a
-
Initialize:
- For each
BasicBlock BLOCK
, setBLOCK.predecessors = []
andBLOCK.successors = []
. - Set
ENTRY_BLOCK
to the firstBasicBlock
in the function. - Optionally, create a dummy exit block named
EXIT_BLOCK
with empty lists.
- For each
-
Link edges:
- For each
BasicBlock BLOCK
, letTERMINATOR
be the last instruction inBLOCK
. - Unconditional branch: if
TERMINATOR
isbr TARGET
, whereTARGET
is the label of the jump target,- Add
TARGET
toBLOCK.successors
. - Add
BLOCK
toTARGET.predecessors
.
- Add
- Conditional branch: if
TERMINATOR
isbr i1 CONDITION, TRUE_TARGET, FALSE_TARGET
,CONDITION
is the boolean value being tested.TRUE_TARGET
is the label whenCONDITION
is true.FALSE_TARGET
is the label whenCONDITION
is false.- Add
TRUE_TARGET
andFALSE_TARGET
toBLOCK.successors
, and addBLOCK
to bothTRUE_TARGET.predecessors
andFALSE_TARGET.predecessors
.
- Return: if
TERMINATOR
isreturn
,- If
EXIT_BLOCK
exists, addEXIT_BLOCK
toBLOCK.successors
and addBLOCK
toEXIT_BLOCK.predecessors
. - Otherwise, leave
BLOCK.successors
empty so thatBLOCK
is treated as an exit block.
- If
- For each
-
Complete:
ENTRY_BLOCK
is the CFG’s start.- Any
BasicBlock
whosesuccessors
list is empty is an exit block (or it isEXIT_BLOCK
if created).
Pseudo-code
for each BasicBlock BLOCK in the function:
// Initialization phase
BLOCK.predecessors = []
BLOCK.successors = []
if using a dummy exit:
create BasicBlock EXIT_BLOCK
EXIT_BLOCK.predecessors = []
EXIT_BLOCK.successors = []
ENTRY_BLOCK = the first BasicBlock
for each BasicBlock BLOCK:
// Determine the control flow edges from the terminator
let TERMINATOR = BLOCK.terminator
if TERMINATOR is "br TARGET":
// Unconditional branch to TARGET
BLOCK.successors.push(TARGET)
TARGET.predecessors.push(BLOCK)
else if TERMINATOR is "br i1 CONDITION, TRUE_TARGET, FALSE_TARGET":
// Conditional branch on CONDITION
// TRUE_TARGET when CONDITION is true
// FALSE_TARGET when CONDITION is false
BLOCK.successors.push(TRUE_TARGET)
TRUE_TARGET.predecessors.push(BLOCK)
BLOCK.successors.push(FALSE_TARGET)
FALSE_TARGET.predecessors.push(BLOCK)
else if TERMINATOR is "return":
// Return instruction
if EXIT_BLOCK exists:
BLOCK.successors.push(EXIT_BLOCK)
EXIT_BLOCK.predecessors.push(BLOCK)
// else, BLOCK.successors remains empty
Mathematical Formulation
Reading: E is the collection of all ordered pairs (b,b′)(b,b′) where control can go directly from block bb to block b′b′ (because bb’s terminator says “jump to b′b′”).
Notation:
- CFG — The control‑flow graph
- = — “equals” or “is defined as”
- ( , ) — An ordered 4‑tuple
- B — The set of all basic blocks in the function
- , — Separates tuple components
- E — The set of directed edges between blocks
- entry — The special block where control begins
- exit — The special block where control ends
- = — Defines E in set‑comprehension form
- { … } — A set of ordered pairs
(b, b')
- (b, b') — An edge from block
b
to blockb'
- | (“such that”) — Introduces the conditions for membership in E
- b, b' ∈ B — Both
b
andb'
are basic blocks in B - , — “and” (both conditions must hold)
- b' is a jump target of b’s terminator — Block
b'
appears as one of the labels in the last instruction (the terminator) of blockb