I. Intraprocedural Analysis (2.1) A. definitions and notation 1. initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? How would we generalize the formalism to handle such statements? ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? 2. blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ How would you define the set of elementary blocks in a statement? What would the set of blocks in an assert statement be? What would be set of blocks in a nondeterministic choice statement be? How would you define the set of labels in a statement? 3. flows and reverse flows ------------------------------------------ FLOWS flow: Stmt -> Powerset(Lab x Lab) flow([x := a]^l) = {} flow([skip]^l) = {} flow(S1; S2) = flow(if [b]^l then S1 else S2) = flow(while [b]^l do S) = ------------------------------------------ How would we use these functions to represent the nodes and edges of a dataflow graph? What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? How you formulate a set of reverse flows? What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? What are the initial nodes of flow^R? 4. program of interest ------------------------------------------ PROGRAM OF INTEREST S* = the top level statement Lab* = labels(S*) Var* = FV(S*) Blocks* = blocks(S*) ------------------------------------------ ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ What kind of programs would not have isolated entries? What kind of programs would not have isolated exits? Could we convert programs to have both isolated entries and exits? ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S have the same label. ------------------------------------------ How would you formalize that? Is there any reason not to have label consistent programs? B. available expressions analysis (2.1.1) 1. trivial and non-trivial expressions What's a trivial expression? ------------------------------------------ SUBEXPRESSIONS Aexp(a) = non-trivial arithmetic subexpressions of a Aexp(b) = non-trivial arithmetic subexpressions of b Aexp* = nontrivial arithmetic expressions in S* ------------------------------------------ 2. idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must have already been computed, and not later modified, on all paths to that program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) ------------------------------------------ What non-trivial expressions are available at entry to block 2? 3. formalization ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) \ killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = {a' \in Aexp(a) | x \not\in FV(a')} genAE([skip]^l) = {} genAE([b]^l) = Aexp(b) ------------------------------------------ What's the role of the dataflow graph here? What does the kill function mean? What does the gen function mean? Why don't we have to define the analysis for while loops and if statements? What are we assuming with this formalism? How would we adjust this if we didn't have isolated entries? 4. observations This is a forward analysis, Why? What makes a "solution" unsafe? What makes it imprecise? We want the largest (safe) sets, Why? Note the use of the word "must", what impact does that have on the analysis? 5. example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5;) What's the control flow graph? ------------------------------------------ ------------------------------------------ KILL AND GEN What are killAE and genAE for this? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ ------------------------------------------ EXAMPLE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ So what sets do we start with to find a solution? So what would the solution be? C. Reaching Definitions Analysis (2.1.2) ------------------------------------------ REACHING DEFINITIONS ANALYSIS (2.1.2) "For each program point, which assignments may have been made and not overwritten, when program execution reaches that point along some path?" ------------------------------------------ Is this a forward or backward analysis? What makes the analysis imprecise? So what solution do we want? Note the use of the word "may" in the analysis statement, what impact does that have on the analysis? D. Very Busy Expressions Analysis (2.1.3) 1. idea and goals ------------------------------------------ VERY BUSY EXPRESSIONS def: A non-trivial expression e is *very busy* at exit from block l if, e must always be used before some x \in FV(e) is assigned. At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 641]^14 ------------------------------------------ ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must be very busy at the exit from the point." ------------------------------------------ What could we use this for? 2. formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else \bigcap { VBentry(l') | (l', l) \in flow^R(S*) } VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) killVB: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = Aexp(b) ------------------------------------------ Is this a forward or backward analysis? Does this analysis need isolated exits? Why is there an intersection for VBexit? Do we want the largest or the smallest solution? 3. example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4; l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ E. Live Variables Analysis (2.1.4) 1. idea and goals ------------------------------------------ LIVE VARIABLES def: A variable x is *live* at exit from label l if there is a path from l to a use of x that does not redefine x. Which variables are live at exit from 1? [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 [x := 3]^1; [y := x+2]^2; [y := y+1]^3 [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ ------------------------------------------ LIVE VARIABLES ANALYSIS "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ Which variables are live at label 1? Label 2? What can we use this for? 2. definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) = (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = {x} killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = FV(a) genLV([skip]^l) = {} genLV([b]^l) = FV(b) ------------------------------------------ Is this a forward or backward analysis? Does this analysis need isolated exits? Why is there a union for LVexit? Do we want the largest or the smallest solution? ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ F. Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var to its last assignment Definition-use (du) chain: links last assignment of var to its use ------------------------------------------ What might this be useful for? 1. formal definitions ------------------------------------------ DEFINITIONS AND USES definition clear path for x clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & ln = l' & n > 0 & (\forall i : 1 <= i < n : (li, li+1) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ Why are the def and use functions correct? How do you interpret the notion of a clear path? Does clear(y, 3, 7) tell you anything about the use of y? ------------------------------------------ UD AND DU ud: Var* x Lab* -> Powerset(Lab*?) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab*? -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ Do these require isolated entries? Are these must or may analyses? Can we define du in terms of ud? 2. example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ===================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ 3. computation How could we use RD and LV to compute ud chains? II. Monotone Frameworks (2.3) Can we identify some commonalities between different analyses? Would doing that help implement them? A. general pattern ------------------------------------------ GENERAL PATTERN A_o(l) = if l \in E then i else A_.(l) = where \bigsqcup is either \bigcup or \bigcap F is either flow(S*) or flow^R(S*) E is {init(S*)} or final(S*) i is initial/final information f(l) is the transfer function for blocks B^l \in blocks(S*) For a forward analysis: F is flow(S*) E is A_o gives the entry conditions A_. gives the exit conditions For a backward analysis: F is flow^R(S*) E is A_o gives the exit conditions A_. gives the entry conditions ------------------------------------------ B. basic definitions (2.3.1) 1. property space ------------------------------------------ PROPERTY SPACES def: a *property space*, L = (L, \bigsqcup), is a set with \bigsqcup: Powerset(L) -> L a join operation that makes it a complete lattice. Thus: l1 \sqcup l2 = \bigsqcup { l1, l2 } \bot = \bigsqcup {} l1 \sqsubseteq l2 = (l1 \sqcup l2 = l2) Examples: For reaching definitions: L = Powerset(Var* x Lab^?_*) \sqcup = \cup \sqsubseteq = \subseteq For available expressions: L = Powerset(AExp*) \sqcup = \cap \sqsubseteq = \supseteq ------------------------------------------ 2. transfer functions ------------------------------------------ TRANSFER FUNCTION SPACE def: Let L be a partially-ordered set. Then Funs is a *transfer function space for L* iff f \in Funs ==> f : L -> L and f is monotone, f,g \in Funs ==> f o g \in Funs, and id_L \in Funs. ------------------------------------------ 3. monotone framework ------------------------------------------ MONOTONE FRAMEWORK def: (L, Funs) is a monotone framework iff L is a property space and Funs is a transfer function space for L. def: (L, Funs, F, E, i, f_.) is an *instance of a monotone framework* if and only if: - (L, Funs) is a monotone framework, - F is a finite set of pairs (of flows), - E is a finite set of extremal labels, - i \in L is an extremal value, - f : (dom(F) \cup E) -> (L -> L) s.t. for l in (dom(F) \cup E) f_l \in Funs ------------------------------------------ C. examples (2.3.2) D. predicate abstraction (new topic) ------------------------------------------ PREDICATE ABSTRACTION Goal: verify program properties Idea: Use property space of the form L = Powerset(Preds) Preds = {P1, ..., Pn} where each Pi is a nullary predicate Interpretation: {P3,P5} means P3 and P5 may (must) be true (depending on kind of analysis) \sqcup is \cup (or \cap) \sqsubseteq is \subseteq (or \supseteq) Funs = monotonic (in \sqsubseteq) functions on L ------------------------------------------ For a may analysis, what's the bottom element? The top? How can you represent states? ------------------------------------------ PREDICATE ABSTRACTION EXAMPLE IsZero Analysis: At a given program point, which variables may be 0. "Abstract States" s \in L = Powerset(Preds) where Preds = {IsZero_x | x in Vars*} IsZero_y means y may be 0 F is flow(S*) E is {init(S*)} i is Preds fIZ(l) : L -> L, for l in Lab* fIZ(l)(s) = (s \ kill_IZ(B^l)(s)) \cup gen_IZ(B^l)(s) where B^l in blocks(S*) kill_IZ([x := a]^l)(s) = {IsZero_x} kill_IZ([skip]^l)(s) = {} kill_IZ([b]^l)(s) = {} gen_IZ([x := a]^l)(s) = {IsZero_x | (\exists cs \in \gamma(s) :: A[[a]]cs == 0)} gen_IZ([skip]^l)(s) = {} gen_IZ([b]^l)(s) = {} \gamma: L -> Store \gamma(s) = {cs | cs: Var* -> Int, IsZero_x \in s ==> cs(x) == 0} ------------------------------------------ What kind of analysis is this? Why this initial value i? What do the gen and kill functions do? What are the equations for IZ_entry(l) and IZ_exit(l)? ------------------------------------------ EXAMPLE [y := 3]^1; while [y>0]^2 do ([q := y-2]^3; [y := y-1]^4); [q := q div y]^5 Var* = {y, q} Preds = {IsZero_y, IsZero_q} IZ_entry(1) = IZ_exit(1) = IZ_entry(2) = IZ_exit(2) = IZ_entry(3) = IZ_exit(3) = IZ_entry(4) = IZ_exit(4) = IZ_entry(5) = IZ_exit(5) = ------------------------------------------ E. equation solving (2.4) 1. MFP (Maximal Fixed Point) solution (2.4.1) 2. MOP solution (2.4.2) (skip)