COP 5021 Lecture -*- Outline -*- * Interprocedural Analysis (2.5) This section looks at analysis for languages with procedures, including the structural operational semantics for such a language. the classic way of doing this is a whole-program analysis (non-modular) But we will attempt a modular analysis, by making procedure summaries (specifications) complications: matching calls and returns, parameter passing mechanisms, aliasing (from call by reference), higher-order procedures ** syntax ------------------------------------------ SYNTAX Procedures with 1 call-by-value parameter, and 1 call-by-result parameter. P \in Program D \in Declaration P ::= begin D; S end D ::= proc p(val x, res y) is^ln S end^lx | D; D S ::= ... | [call p(a,z)]^lc_lr Example: begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end ------------------------------------------ Q: In call p(a,z), what syntactic category is a? z? arithmetic expression, identifier Q: What parameter passing mechanisms should be used? call by value for 1st parameter, call by result for the second Q: What assumptions would simplify the analysis? assume unique labels (label consistency) assume no redeclarations, only declared procedures are called assume names of all formals in each proc decl are distinct (a =/= z) Q: What else do we need to do analysis and correctness proofs? operational semantics flows, etc. ** operational semantics (2.5.1) Q: How is a procedure different than a macro? procedures have local variables (formals), different in each call so names aren't sufficient to distinguish each instantiation of a formal parameter, need locations... ------------------------------------------ OPERATIONAL SEMANTICS xi in Loc locations rho in Env = Var* -> Loc environments s in Store = Loc ->_fin Z stores (rho,s) in State = Env x Store states Assume s composed-with rho is total: ran(rho) \subseteq dom(s) ------------------------------------------ (the book uses \varsigma for this kind of store, instead of \sigma) Loc ->_fin Z is set of partial functions with a finite domain Q: How do these states relate to the states we had previously? old ones are the composition of a store and an environment Q: How should we deal with global variables? top-level environment, rho* assume injective ------------------------------------------ OPERATIONAL SEMANTICS [skip] rho |-* ([skip]^l, s) --> s [asgn] rho |-* ([x:=a]^l, s) --> s[rho(x) |-> A[[a]](s o rho)] if s o rho is total ------------------------------------------ s o rho is total means that range(rho) \subseteq dom(s) Q: What would the sequence rules look like? while? if? essentially as before with addition of rho to left of turnstile rho |- (S1,s) --> (S1',s') [seq1] ----------------------------------- rho |- (S1; S2, s) --> (S1';S2, s') rho |- (S1,s) --> s' [seq2] ------------------------------- rho |- (S1; S2, s) --> (S2, s') [wh1] rho |- (while [b]^l do S, s) --> (S; while [b]^l do S, s) if B[[b]](s o rho) = true [wh2] rho |- (while [b]^l do S, s) --> s if B[[b]](s o rho) = false [if1] rho |- (if [b]^l then S1 else S2 , s) --> (S1, s) if B[[b]](s o rho) = true [wh2] rho |- (if [b]^l then S1 else S2 , s) --> (S2, s) if B[[b]](s o rho) = false Q: What would the calls look like in the operational semantics? two parts, evaluation of actual parameters, and parameter passing ------------------------------------------ CALL AND BIND RULES [call] rho |-* ([call p(a,z)]^lc_lr, s) --> (bind rho'' in S then z := y, s'') if xi1, xi2 not in dom(s), v in Z, proc p(val x, res y) is^ln S end^lx is in D*, rho'' = rho*[x |-> xi1, y |-> xi2] s'' = s[xi1 |-> A[[a]](s o r), xi2 |-> v] rho' |-* (S, s) --> (S', s') [bind1]---------------------------------- rho |-* (bind rho' in S then z := y, s) --> (bind rho' in S' then z := y, s') rho' |-* (S, s) --> s' [bind2]---------------------------------- rho |-* (bind rho' in S then z := y, s) --> s'[rho(z) |-> s'(rho'(y))] ------------------------------------------ Q: What is rho* in the call rule? The global (top-level) environment. Q: Where is bind-in-then in the surface syntax of While? It isn't part of the surface syntax, it's just used for the operational semantics Q: Why add bind-in-then to the language? To ensure static scoping and enforce the result parameter. Scoping happens because we use rho'' during evaluation of the body, and then when the call finishes, the semantics continues using rho Q: How do you parse the first rule? The transition scheme and a side condition Q: What is [bind2] doing? termination of call, pass by result for z Q: In [bind2], why is rho(z) used instead of rho'(z)? Because the location of the result argument is given by the surrounding scope, so this is correct. Q: Do we have to deal with bind-in-then for proofs? yes! ------------------------------------------ EXAMPLE SEMANTIC CALCULATION Let P be proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 Name each statement S_i if it has label i (and for calls, use the top label). Let S_2 be if [n == 0]^2 then [v := 1]^3 else (...) Let S_79 be ([call fact(3,v)]^7_8; [call fact(v,w)]^9_10) Let rho* = {v |-> 0, w |-> 1} s00 = {0 |-> 0, 1 |-> 0} (and for later use) Let rho1 = rho*[n |-> 2, v |-> 3] = {n |-> 2, v |-> 3, w |-> 1} s0039 = {0 |-> 0, 1 |-> 0, 2 |-> 3, 3 |-> 9} rho2 = rho*[n |-> 4, v |-> 5] = {n |-> 4, v |-> 5, w |-> 1} s0027 = {0 |-> 0, 1 |-> 0, 2 |-> 3, 3 |-> 9, 4 |-> 2, 5 |-> 7} Calculate in the context of rho*: rho* |-* (S_79, s00) --> * ([call fact(3,v)]^7_8, s00) --> (bind rho*[n |-> 2, v |-> 3] in S_2 then v := v, s00[2 |-> 3, 3 |-> 9]) = (bind rho1 in S_2 then v := v, s0039) . (bind rho1 in S_2 then v := v; S_9, s0039) --> * rho* |-* (bind rho1 in S_2 then v := v, s0039) --> * rho1 |-* (if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11), s0039) --> . ([call fact(n-1, v)]^4_5; [v:=v*n]^11, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v, s0039) . (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v; S_9, s0039) --> * rho* |-* (bind rho1 in ([call fact(n-1, v)]^4_5; [v:=v*n]^11) then v := v, s0039) --> * rho1 |-* (([call fact(n-1, v)]^4_5; [v:=v*n]^11), s0039) --> ((bind rho2 in S_2 then v := v), s0027) . (((bind rho2 in S_2 then v:= v); [v:=v*n]^11), s0027) . (((bind rho1 in ((bind rho2 in S_2 then v := v); [v:=v*n]^11) then v := v); [v:=v*n]^11), s0027) . ((((bind rho1 in ((bind rho2 in S_2 then v := v); [v:=v*n]^11) then v := v); [v:=v*n]^11); S_9), s0027) --> * rho* |-* (((bind rho1 in ((bind rho2 in S_2 then v := v); [v:=v*n]^11) then v := v); [v:=v*n]^11), s0027) --> * rho* |-* (bind rho1 in ((bind rho2 in S_2 then v := v); [v:=v*n]^11), s0027) --> * rho1 |-* (((bind rho2 in S_2 then v := v); [v:=v*n]^11), s0027) --> * rho1 |-* ((bind rho2 in S_2 then v := v), s0027) --> * rho2 |-* (S_2, s0027) = rho2 |-* (if [n == 0]^2 then ..., s0027) --> (([call fact(n-1,v)] ; [v:=v*n]^11), s0027) . ((bind rho2 in ([call fact(n-1,v)] ; [v:=v*n]^11) then v:=v), s0027) . (((bind rho2 in ([call fact(n-1,v)] ; [v:=v*n]^11) then v:=v); [v:=v*n]^11), s0027) . ((bind rho1 in ((bind rho2 in ([call fact(n-1, v)]; [v:=v*n]^11) then v := v); [v:=v*n]^11), then v:=v), s0027) . (((bind rho1- in ((bind rho2 in ([call fact(n-1, v)]; [v:=v*n]^11) then v := v); [v:=v*n]^11) then v := v); [v:=v*n]^11) , s0027) . ((((bind rho1- in ((bind rho2 in ([call fact(n-1, v)]; [v:=v*n]^11) then v := v); [v:=v*n]^11) then v := v); [v:=v*n]^11); S_9) , s0027) ------------------------------------------ could continue this... ** flow graphs (non-modular) Q: How should we make flow graphs for calls? ------------------------------------------ FLOW GRAPHS FOR CALLS init([call p(a, z)]^lc_lr) = final([call p(a, z)]^lc_lr) = blocks([call p(a, z)]^lc_lr) = labels([call p(a, z)]^lc_lr) = flow([call p(a, z)]^lc_lr) = {(lc;ln), (lx;lr)} if proc p(val x, res y) is^ln S end^lx is in D* ------------------------------------------ Q: What should these be? ... lc ... {lr} ... {[call p(a,z)]^lc_lr} ... {lc, lr} Q: Why use semicolons for the flows? to see which ones are procedure flows... Q: What would happen if p was a program variable? dynamic dispatch ==> harder to determine the exact code called would need another analysis to tell what the possible code called could be ------------------------------------------ FLOW GRAPHS FOR PROCEDURES For each procedure declaration proc p(val x, res y) is^ln S end^lx init(p) = final(p) = blocks(p) = {is^ln, end^lx} \cup blocks(S) labels(p) = flow(p) = ------------------------------------------ Q: What should these be? ... ln ... {lx} ... {ln,lx} \cup labels(S) ... {(ln,init(S)} \cup flow(S) \cup {(l,lx)|l \in final(S)} ------------------------------------------ FLOW GRAPHS FOR PROGRAMS For program P* = begin D* S* end init* = init(S*) final* = final(S*) blocks* = blocks(S*) \cup \bigcup {blocks(p) | proc p... in D*} labels* = labels(S*) \cup \bigcup {labels(p) | proc p... in D*} flow* = flow(S*) \cup \bigcup {flow(p) | proc p... in D*} ------------------------------------------ Q: Is the flow graph for a program still finite? Yes. Q: What is Lab* for such a program? labels* *** precision problems The following is what we did just above... ------------------------------------------ PROBLEMS WITH PRECISION Suppose we treat flows (lc;ln) and (lx;lr) the same as all other flows... In terms of a monotone framework get: A_.(l) = f_l(A_o(l)) A_o(l) = \bigsqcup { A_.(l') | (l',l) in F or (l';l) in F} \sqcup i^l_E where i^l_E = i if l \in E and i^l_E = \bot if l \not\in E ------------------------------------------ Q: Does the information from a call along (lc;ln) necessarily flow back to the associated label via (lx;lr)? No, so that causes a bunch of imprecision! all returns come from all calls! ------------------------------------------ A SOLUTION IDEA: INTERPROCEDURAL FLOWS filter flows to avoid returns from non-calls inter-flow* = { (lc, ln, lx, lr) | P* contains [call p(a,z)]^lc_lr and proc p(val x, res y) is^ln S end^lx } Notation: IF is an abstraction of inter-flow* for forward analysis: IF = inter-flow* \cap flow* for backward analysis: IF = inter-flow^R* \cap flow^R* ------------------------------------------ Q: How could we use IF (i.e., inter-flow* or it's reverse)? relates calls and returns Suppose we have (lpc, lpn, lpx, lpr) and (lqc, lpn, lpx, lqr) in inter-flow* Then (lpc;lpn), (lqc;lpn), (lpx;lpr), and (lpx;lqr) will all be flows (see the def. of flow([call p(...)]), but there can't be a trace where (lpc;lpn) is followed by (lpx;lqr). Thus although it might appear that (lpc, lpn, lpx, lqr) could be a 4-tuple, that can't happen. ------------------------------------------ EXAMPLE begin proc fact(val n, res v) is^1 if [n == 0]^2 then [v := 1]^3 else ([call fact(n-1, v)]^4_5; [v:=v*n]^11) end^6; [call fact(3,v)]^7_8; [call fact(v,w)]^9_10 end What is flow*? What is inter-flow*? ------------------------------------------ ... flow* = {(7;1),(6;8),(1,2),(2,3),(2,4),(4;1),(6;5), (5,11),(9;1),(6;10)} Draw the flow graph Q: What else do we need to do analysis? (nothing?) Q: What is the size of the flow graph in terms of the number of calls? it's linear Q: What is the size of the flow graph for a recursive procedure? it's not infinite, as the recrusive calls share the graph, see Fig 2.7 Q: Is this the right way to look at dataflow problems? it's not clear... ... inter-flow* = {(7,1,6,8),(4,1,6,5),(9,1,6,10)} ** a modular approach, procedure summaries Q: What do we do for type checking of procedures? we have a type environment that tells what each does Idea: let's do the same thing for other kinds of analysis. ------------------------------------------ PLAN FOR PROCEDURE SUMMARIES 0. Consider call to be a kind of elementary block. 1. Compute the analysis information for each procedure p, based on its body 2. Summarize each procedure p with a transfer function summary(p): 3. Call statements have kill and gen (i.e., transfer) functions that uses summary(p) and handles argument passing 4. Compute a fixed-point to solve for summaries Iteration: start with a "bottom" summary: summary_0 = \p.\(i,v). \bot Then iterate construction of summary until reaches a fixed-point ------------------------------------------ ... summary(p): Int x Var* -> L gives the effect of a call to p in terms of the property space (for the given actual arguments, the Int and the Var* elems) so summary: ProcName -> (Int x Var* -> L) Q: What kind of dependency can summary(p) have on the Int argument? probably not much, if anything. Should ignore it by treating it symbolicaly. *** example: reaching definitions ------------------------------------------ EXAMPLE: REACHING DEFINITIONS L = P(Var* x Lab*^?) For analysis within a procedure, proc p(val n, res v) is^ln S end^lx formal n is considered initialized at label ln summary(p) = \(i,v). \bigcup {RD_exit(l)|l \in finals(S)} kill_RD([call p(a,z)]^lc_lr) = {(z,l)|l \in \Lab*^?} gen_RD([call p(a,z)]^lc_lr) = summary(p)(a,z) \cup {(z,lr)} ------------------------------------------ Note that the summary is a purely local effect