COP 5021 Lecture -*- Outline -*- * Intraprocedural Analysis (2.1) Several examples of data flow analysis for the WHILE language. Goal: learn how to precisely define them (implement in computer) Analyses defined by pairs of entry/exit functions, that map labels to analysis information (sets). The analysis operates on the control flow graph of the program... ** definitions and notation The point of these is to represent the control flow graph. These are all defined by structural induction. *** initial and final labels ------------------------------------------ INITIAL LABEL init: Stmt -> Lab init([x := a]^l) = l init([skip]^l) = l init(S1; S2) = init(if [b]^l then S1 else S2) = init (while [b]^l do S) = ------------------------------------------ ... init(S1) l l Q: What would the initial label of a nondeterministic choice statement be? Of a parallel composition statement? Q: How would we generalize the formalism to handle such statements? perhaps return a set of labels. ------------------------------------------ FINAL LABELS final: Stmt -> Powerset(Lab) final([x := a]^l) = {l} final([skip]^l) = {l} final(S1; S2) = final(if [b]^l then S1 else S2) = final (while [b]^l do S) = ------------------------------------------ ... final(S2) final(S1) \cup final(S2) l Q: What would the final label set of a nondeterministic choice statement be? Of a parallel composition statement? *** blocks and labels ------------------------------------------ ELEMENTARY BLOCKS blocks: Stmt -> Powerset(Block) ------------------------------------------ Q: How would you define the set of elementary blocks in a statement? blocks([x := a]^l) = {[x := a]^l} blocks([skip]^l) = {[skip]^l} blocks(S1; S2) = blocks(S1) \cup blocks(S2) blocks(if [b]^l then S1 else S2) = {[b]^l} \cup blocks(S1) \cup blocks(S2) blocks(while [b]^l do S) = {[b]^l} \cup blocks(S) Q: What would the set of blocks in an assert statement be? Q: What would be set of blocks in a nondeterministic choice statement be? Q: How would you define the set of labels in a statement? labels: Stmt -> Powerset(Lab) labels(S) = {l | [B]^l \in blocks(S)} *** flows and reverse flows ------------------------------------------ FLOWS flow: Stmt -> Powerset(Lab x Lab) flow([x := a]^l) = {} flow([skip]^l) = {} flow(S1; S2) = flow(if [b]^l then S1 else S2) = flow(while [b]^l do S) = ------------------------------------------ ... flow(S1) \cup flow(S2) \cup {(l, init(S2))| l \in final(S1)} flow(S1) \cup flow(S2) \cup {(l,init(S1)} \cup {(l,init(S2)} flow(S) \cup {(l, init(S))} \cup {(l', l)| l' \in final(S)} Q: How would we use these functions to represent the nodes and edges of a control flow graph? Q: What are labels and edges of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? Q: How you formulate a set of reverse flows? flow^R: Stmt -> Powerset(Lab xLab) flow^R(S) = {(l', l) | (l, l') \in flow(S)} Q: What is flow^R of if [x > 3]^1 then [y:=2]^2 else [z:=3]^3 ? flow = {(1,2),(2,3),(3,1)} so flow = {(2,1),(3,2),(1,3)} Q: What are the initial nodes of flow^R? the final nodes of flow *** program of interest In the book, the stars are subscripts, but the ascii version here uses them in line S* vs. S * ------------------------------------------ PROGRAM OF INTEREST S* = the top level statement Lab* = labels(S*) Var* = FV(S*) Blocks* = blocks(S*) ------------------------------------------ ------------------------------------------ ISOLATED ENTRIES AND EXITS def: S* has isolated entries iff def: S* has isolated exits iff ------------------------------------------ ... (\forall l \in Lab :: (l, init(S*)) \not\in flow(S*)) ... (\forall l1 \in final(S*) :: (\forall l2 \in Lab :: (l1, l2) \not\in flow(S*))) Q: What kind of programs would not have isolated entries? Those that do not start with a while-loop. Q: What kind of programs would not have isolated exits? Those that do not end with an if-then-else statement or while.... Q: Could we convert programs to have both isolated entries and exits? Yes, add skip statements as necessary So no loss of generality in assuming these ------------------------------------------ LABEL CONSISTENT def: S is label consistent if and only if no two blocks in S have the same label. ------------------------------------------ Q: How would you formalize that? [B1]^l, [B2]^l \in blocks(S) ==> B1 == B2 Q: Is there any reason not to have label consistent programs? ** available expressions analysis (2.1.1) Now we're going to look at 4 different analyses, to illustrate the formalism and to see how to precisely define dataflow analyses. *** trivial and non-trivial expressions Q: What's a trivial expression? a single variable or a constant, in other words, a base case ------------------------------------------ SUBEXPRESSIONS Aexp(a) = non-trivial arithmetic subexpressions of a Aexp(b) = non-trivial arithmetic subexpressions of b Aexp* = nontrivial arithmetic expressions in S* ------------------------------------------ *** idea, goal ------------------------------------------ AVAILABLE EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must have already been computed, and not later modified, on all paths to that program point." Example: [k := i*j-1]^1; while [i*j-1 < n]^2 do { [t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5 } ------------------------------------------ Q: What non-trivial expressions are available at entry to block 2? i*j and (i*j)-1 *** formalization The basic idea is to define functions on each elementary block, using two sub-functions: gen and kill... ------------------------------------------ FORMAL DEFINITION AEentry(l) = if l = init(S*) then {} else \bigcap {AEexit(l') | (l',l) \in flow(S*)} AEexit(l) = (AEentry(l) \ killAE(B^l)) \cup genAE(B^l) where B^l \in blocks(S*) killAE: Blocks* -> Powerset(Aexp*) killAE([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killAE([skip]^l) = {} killAE([b]^l) = {} genAE: Blocks* -> Powerset(Aexp*) genAE([x:= a]^l) = {a' \in Aexp(a) | x \not\in FV(a')} genAE([skip]^l) = {} genAE([b]^l) = Aexp(b) ------------------------------------------ Q: What's the role of the control flow graph here? it's an implicit parameter, so these equations generate particular sets for every control flow graph (program). Q: What does the kill function mean? It says what expressions are no longer available, since they were assigned (what to take out of the analysis). Q: What does the gen function mean? It says what expressions become available. Q: Why don't we have to define the analysis for while loops and if statements? it's implicit in the DFG Q: What are we assuming with this formalism? That the program is label consistent. Why? because of the use of blocks(S*) That the program has isolated entries. Why? because of the use of l == init(S*) Q: How would we adjust this if we didn't have isolated entries? (1) put a skip at the beginning of the program. (2) simulate the effect of that, by intersecting {} with other exits *** observations Q: This is a forward analysis, Why? because we don't use flow^R Q: What makes a "solution" unsafe? too much in the set Q: What makes it imprecise? too little in the set Q: We want the largest (safe) sets, Why? because we want more information to use in optimizations Q: Note the use of the word "must", what impact does that have on the analysis? intersections of entry information, makes us want the largest set *** example revisited ------------------------------------------ EXAMPLE [k := i*j-1]^1; while [i*j-1 < n]^2 do ([t := a+k]^3; [j := j+1]^4; [k := i*j-1]^5) What's the control flow graph? ------------------------------------------ ------------------------------------------ KILL AND GEN What are killAE and genAE for this? l killAE(l) genAE(l) ============================ 1 2 3 4 5 ------------------------------------------ ------------------------------------------ EXAMPLE EQUATIONS AEentry(1) = AEentry(2) = AEentry(3) = AEentry(4) = AEentry(5) = AEexit(1) = AEexit(2) = AEexit(3) = AEexit(4) = AEexit(5) = ------------------------------------------ Q: So what sets do we start with to find a solution? Aexp*, because this is a must (intersection) analysis Q: So what would the solution be? ** Reaching Definitions Analysis (2.1.2) ------------------------------------------ REACHING DEFINITIONS ANALYSIS (2.1.2) "For each program point, which assignments may have been made and not overwritten, when program execution reaches that point along some path?" ------------------------------------------ Q: Is this a forward or backward analysis? forward, uses flow not flow^R Q: What makes the analysis imprecise? larger sets Q: So what solution do we want? the smallest set Q: Note the use of the word "may" in the analysis statement, what impact does that have on the analysis? unions of entry information, makes us want the smallest set ** Very Busy Expressions Analysis (2.1.3) *** idea and goals This can be useful in hoisting expressions: evaluating an expression and storing it for later use (e.g., in a register). ------------------------------------------ VERY BUSY EXPRESSIONS def: A non-trivial expression e is *very busy* at exit from block l if, e must always be used before some x \in FV(e) is assigned. At what points is a+b very busy in: [x := a+b]^1; [y := a+b]^2 if [a-b > a+b]^3 then [x := a+b]^4 else [y := a+b]^5 [q := r]^7; [z := a+b]^8; if [a>b]^9 then [x := a+b]^10 else [y := a+b]^11 if [a>b]^12 then [x := a+b]^13 else [y := 641]^14 ------------------------------------------ ... exits of 1, 3, 7, 9 (not 12 or any of the others) "an expression is very busy at exit from a block if it is very busy at the entry to every block that follows." "However, no expressions are very busy at the exit from any final block" ------------------------------------------ VERY BUSY EXPRESSIONS ANALYSIS "For each program point, which [non-trivial] expressions must be very busy at the exit from the point." ------------------------------------------ Q: When should variables be live at the end? If a program is a procedure body and the variables are all either global or call-by-reference parameters Q: What could we use this for? Hoisting, which is to store the value of the expression for later use. *** formal definition ------------------------------------------ FORMAL DEFINITION VBexit(l) = if l \in final(S*) then {} else \bigcap { VBentry(l') | (l', l) \in flow^R(S*) } VBentry(l) = (VBexit(l) \ killVB(B^l)) \cup genVB(B^l) where B^l \in blocks(S*) killVB: Blocks* -> Powerset(Aexp*) killVB([x:= a]^l) = {a' \in Aexp* | x \in FV(a')} killVB([skip]^l) = {} killVB([b]^l) = {} genVB: Blocks* -> Powerset(Aexp*) genVB([x:= a]^l) = Aexp(a) genVB([skip]^l) = {} genVB([b]^l) = Aexp(b) ------------------------------------------ Q: Is this a forward or backward analysis? backwards, because we use flow^R. Q: Does this analysis need isolated exits? the authors say so, why? because of the initial condition for LVexit() it's a "may" analysis Q: Why is there an intersection for VBexit? because this is a "must" analysis and we care about all paths Q: Do we want the largest or the smallest solution? largest, because smaller is imprecise *** example ------------------------------------------ EXAMPLE if [a-b > a+b]^1 then [x := a+b]^2 else [y := a+b]^3; [z := a]^4 l killVB(l) genVB(l) ======================= 1 2 3 4 VBentry(1) = VBexit(1) = VBentry(2) = VBexit(2) = VBentry(3) = VBexit(3) = VBentry(4) = VBexit(4) = ------------------------------------------ ** Live Variables Analysis (2.1.4) *** idea and goals ------------------------------------------ LIVE VARIABLES def: A variable x is *live* at exit from label l if there is a path from l to a use of x that does not redefine x. Which variables are live at exit from 1? [x := 3]^1; if [z > 0]^2 then [y := x+2]^3 else [q := q+1]^4 [x := 3]^1; [y := x+2]^2; [y := y+1]^3 [x := 3]^1; [z := 4]^2; [x := z+2]^3 while [z > 0]^4 do ([y := x+2]^5; [z := z-1]^6) ------------------------------------------ ------------------------------------------ LIVE VARIABLES ANALYSIS "For each program point, which variables may be live at the exit from the point." Example: [x := 3]^1; [z := 4]^2; [x := z+2]^3 ------------------------------------------ Q: Which variables are live at label 1? Label 2? none and z. The authors take the view "and that no variables are live at the end of the program" however they remark that "for some applications it might be better to assume that all variables are live at the end of the program." Q: What can we use this for? dead code elimination: if the variable is not live at exit from a block that assigns to it, the assignment can be eliminated. *** definitions and formalization ------------------------------------------ FORMAL DEFINITION LVexit(l) = if l \in final(S*) then {} else \bigcup { LVentry(l') | (l', l) \in flow^R(S*) } LVentry(l) = (LVexit(l) \ killLV(B^l)) \cup genLV(B^l) where B^l \in blocks(S*) kill: Blocks* -> Powerset(Var*) killLV([x:= a]^l) = {x} killLV([skip]^l) = {} killLV([b]^l) = {} genLV: Blocks* -> Powerset(Var*) genLV([x:= a]^l) = FV(a) genLV([skip]^l) = {} genLV([b]^l) = FV(b) ------------------------------------------ Q: Is this a forward or backward analysis? backwards, because we use flow^R. Q: Does this analysis need isolated exits? the authors say so, why? Q: Why is there a union for LVexit? Q: Do we want the largest or the smallest solution? smallest, because larger is imprecise We want to know exactly what the dead variables are. ------------------------------------------ EXAMPLE [x := 3]^1; [z := 4]^2; [x := z+2]^3 l killLV(l) genLV(1) ========================== 1 2 3 LVentry(1) = LVexit(1) = LVentry(2) = LVexit(2) = LVentry(3) = LVexit(3) = ------------------------------------------ ** Derived Data Flow Information (2.1.5) ------------------------------------------ LINKING DEFINITIONS AND USES Use-definition (ud) chain: links use of var to its last assignment Definition-use (du) chain: links last assignment of var to its use ------------------------------------------ Q: What might this be useful for? dead code elimination code motion (reordering) *** formal definitions ------------------------------------------ DEFINITIONS AND USES definition clear path for x clear(x, l, l') = (\exists l1, ..., ln :: l = l1 & ln = l' & n > 0 & (\forall i : 1 <= i < n : (li, li+1) \in flow(S*)) & (\forall i : 1 <= i < n : not(def(x, li))) & use(x, ln)) def(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in killLV([B]^l)) use(x, l) = (\exists B : [B]^l \in blocks(S*) : x \in genLV([B]^l)) ------------------------------------------ Q: Why are the def and use functions correct? Q: How do you interpret the notion of a clear path? Q: Does clear(y, 3, 7) tell you anything about the use of y? yes, it's used at 7 ------------------------------------------ UD AND DU ud: Var* x Lab* -> Powerset(Lab*?) ud(x, l') = {l | def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} \cup {? | clear(x, init(S*), l')} du: Var* x Lab*? -> Powerset(Lab*) du(x, l) = if l != ? then {l'| def(x, l), (\exists l2 : (l, l2) \in flow(S*): clear(x, l2, l'))} else {l'| clear(x, init(S*), l')} ------------------------------------------ Q: Do these require isolated entries? yes Q: Are these must or may analyses? may Q: Can we define du in terms of ud? *** example ------------------------------------------ EXAMPLE [z := 3]^1; if [y > 0]^2 then [y := z+2]^3 else [y := y+1]^4 ud(x, l) l \ x | y z ===================== 1 2 3 4 du(x, l) l \ x | y z ===================== 1 2 3 4 ------------------------------------------ *** computation Q: How could we use RD and LV to compute ud chains? see p. 54 UD : Var* x Lab* -> Powerset(Lab*) UD(x,l) = if x \in genLV(B^l) then {l' | (x,l') \in RDentry(l)} else {} computing DU is an exercise.