I. The Nature of Program Analysis (1.1) A. goals What are the goals of program analysis? B. ideas What are the main ideas of program analysis? What's the difference between a semantics-based approach and a semantics-directed approach? C. applications What other areas, besides optimizing compilers, could the ideas and goals of program analysis usefully be applied? II. Setting the Scene (1.2) A. The While Language 1. concrete syntax ------------------------------------------ CONCRETE SYNTAX FOR THE WHILE LANGUAGE % ---- Regular (lexical) syntax ---- ::= \r | \n | \r\n ::= "any character except a " ::= % ::= | " " | \t | \f ::= != | == | < | > | <= | >= ::= + | - ::= * | / ::= "any letter" | ' | _ ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= ( | )* ::= (-)?()+ % ---- Context-Free syntax ---- ::= ::= '{' '}' ::= | ; ::= := | skip | | while do | if then else ::= | ::= | ::= | | ( ) ::= | or ::= | and ::= | not | | ( ) ::= true | false ------------------------------------------ 2. abstract syntax ------------------------------------------ ABSTRACT SYNTAX OF THE WHILE LANGUAGE a \in AExp "arithmetic expressions" b \in BExp "Boolean expressions" S \in Stmt "statements" x,y \in Var "variables" n \in Num "numeric literals" l \in Lab "labels" opa \in Op_a "arithmetic operators" opb \in Op_b "Boolean operators" opr \in Op_r "relational operators" S ::= [x:= a]^l | [skip]^l | S1 ; S2 | if [b]^l then S1 else S2 | while [b]^l do S a ::= x | n | a1 opa a2 b ::= true | false | not b | b1 opb b2 | a1 opr a2 ------------------------------------------ What's the difference between abstract and concrete syntax? ------------------------------------------ EXAMPLE [y := x]^1; [z := 1]^2; while [y>1]^3 do ([z := z*y]^4; [y := y-1]^5); [y := 0]^6 ------------------------------------------ Why are the labels only attached to certain places in the abstract syntax trees? Which blocks are elementary? What is used to disambiguate different parse trees? 3. semantics What does the example above do? What does skip do? What kind of numeric literals are in this language? What would be some reasonable arithmetic operators? One would be some reasonable relational operators? What would be some reasonable Boolean operators? What's the type of the relational operators? Can the labels be repeated? What's missing? 4. variations ------------------------------------------ VARIATION 1: CONTROL FEATURES S ::= ... | break | for x in a1 .. a2 do S | throw | try S1 catch S2 VARIATION 2: SPECIFICATION FEATURES S ::= ... | assert b | assume b | choose S1 or S2 VARIATION 3: PARALLELISM S ::= ... | S1 `||' S2 VARIATION 4: FUNCTIONS a ::= ... | fn x => a | a1 a2 | let d in a d ::= x = a | d1; d2 ------------------------------------------ Which of these need labels, and where do the labels go? B. reaching definitions analysis 1. definition ------------------------------------------ REACHING DEFINITIONS (ASSIGNMENTS) def: Let P be a program. An assignment [x := a]^l *may reach* a program point in P if in some execution of P, when execution reaches that point, the last assignment to x was done at l. ------------------------------------------ What can we tell for certain from this analysis's results? 2. examples ------------------------------------------ EXAMPLE [y := x]^1; [z := 1]^2; while [y>1]^3 do ([z := z*y]^4; [y := y-1]^5); [y := 0]^6 (y,1) reaches entry to 2 ------------------------------------------ What definitions reach the entry for label 3? What definitions can't reach the entry for label 3? How do we talk about uninitialized variables? In table 1.1, what if we add (y,1) to RDentry(5)? In table 1.1, what if we remove (y,5) from RDentry(6)? ------------------------------------------ FOR YOU TO DO Compute a table like table 1.1 for: [t := x]^1; [x := y]^2; [y := t]^3 if [y>x]^4 then [r := y]^5 else [r := x]^6; assert [r >= x and r >= y]^7 ------------------------------------------ 3. variations What would the analysis look like if we only wanted to keep track of which variables were assigned? What would the analysis look like if we want to keep track of what variables could influence the values of other variables? III. Dataflow Analysis (1.3) A. goals B. idea What's the basic idea? What's a data flow graph? How is that used to model the semantics? C. example ------------------------------------------ EXAMPLE [q := 0]^1; [r := x]^2; while [r >= y]^3 do ([r := r-y]^4; [q := q+1]^5); assert [0<=r and r Powerset(Var*) where Lab* = set of labels in program Var* = set of variables in prog block Equation ======================================= [x:=a]^l AVexit(l) = [skip]^l AVexit(l) = [b]^l AVexit(l) = How are edges connected? ------------------------------------------ Why are these the right equations? How does this work out for our example? a. algorithm for solving the equations What can we do to solve a set of simultaneous equations? ------------------------------------------ LEAST SOLUTION F: (Powerset(Var*))^12 -> (Powerset(Var*))^12 F is defined by: F(AV_1, ..., AV_{12}) = (F_1(AV_1, ..., AV_{12}), F_2(AV_1, ..., AV_{12}), ..., F_{12}(AV_1, ..., AV_{12})) where F_1(AV_1, ..., AV_{12}) = {(q,?),(r,?),(x,?),(y,?)} F_2(AV_1, ..., AV_{12}) = AV_1 union {x} Solution (AVentry(1), AVexit(1), ..., AVexit(6)) is a solution if (AVentry(1), AVexit(1), ..., AVexit(6)) = F(AVentry(1), AVexit(1), ..., AVexit(6)) ------------------------------------------ Would this still be a 12-tuple if there were seven elementary blocks? How to compare such tuples? What does the lattice structure of (Powerset(Var* x Lab*))^12 look like? How can one find the fixed point using this information? 2. the constraint based approach What are the constraints? How do the elementary blocks work? How do the flows work? What's the connection to the equations? IV. constraint based analysis (1.4) A. goals What's the difference between a data flow analysis and a control flow analysis? Why isn't control flow obvious in all languages? B. setting ------------------------------------------ SETTING 1. Convert all control structures to functions. 2. Analysis finds what functions can be called from each point ------------------------------------------ Why switch to a functional language for this section? ------------------------------------------ CONTINUATION PASSING STYLE An intermediate language with one control structure Idea: every expression takes a "continuation" to which it sends its result Examples: x < 0 ==> [fn k => [[[%< x] 0] k]] if [x < 0] then [y := 22] else [z := 33] ==> [fn k => [[[%< x] 0] [%if [[y := 22] k] [[z := 33] k]]]] ------------------------------------------ ------------------------------------------ LANGUAGE (p. 140) Work in a functional language: e \in Exp t \in Term f,x \in Var c \in Const op \in Op l \in Lab e ::= t^l t ::= c | x | fn x => e_0 "non-recursive fun" | fun f x => e_0 "recursive fun def" | e_1 e_2 | if e_0 then e_1 else e_2 | let x = e_1 in e_2 | e_1 op e_2 ------------------------------------------ What are the atomic blocks being labeled here? C. idea What are the main ideas in this approach? ------------------------------------------ IDEAS OF CONSTRAINT BASED ANALYSIS - assume no side effects ==> associate information with labels - use a pair of functions, (C,p): C: Lab* -> Powerset(Value) C(l) contains possible values for subexpression at label l p: Var* -> Powerset(Value) p(x) constains possible values for variable x ------------------------------------------ What's the alternative to associating information directly with labels? How could this information be useful in an object-oriented program? How is this different than a type system? ------------------------------------------ APPROACH - collect constraints for function abstractions: e.g., given [fn x => [x]^1]^2 get {[fn x => [x]^1]} \subseteq C(2) ------------------------------------------ What's the value of a function definition? Why do they just use the term instead of a closure? What's the general pattern here? ------------------------------------------ for variables: e.g., given [x]^1 get p(x) \subseteq C(1) ------------------------------------------ What's the general pattern here? ------------------------------------------ for applications: e.g., given [[f]^1 [e]^2]^3 get {v | g \in C(1), a \in C(2), and v = (g a)} \subseteq C(3) ------------------------------------------ What's the general rule? What happens in [[fn x => [x]^1]^2 [fn y => [y]^3]^4]^5 ? What would be the constraints for an expression of the form [[e1]^1 op [e2]^2]^3 ? What would be the constraints for an if-then-else expression? What would be the constraints for a let expression, like let x = e1 in e2 ? V. Abstract Interpretation (1.5) A. goals B. idea What's the basic idea? What's a collecting semantics? How is that used to extract the analysis? C. example ------------------------------------------ EXAMPLE [q := 0]^1; [r := x]^2; while [r >= y]^3 do ([r := r-y]^4; [q := q+1]^5); assert [0<=r and r (Powerset(Trace))^{12} G is defined by: G(CS_1, ..., CS_{12}) = (G_1(CS_1, ..., CS_{12}), G_2(CS_1, ..., CS_{12}), ..., G_{12}(CS_1, ..., CS_{12})) where G_1(CS_1, ..., CS_{12}) = {()} G_2(CS_1, ..., CS_{12}) = {tr : q | tr \in CS_1} G_3(CS_1, ..., CS_{12}) = CS_2 ... Solution (CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) is a solution if G(CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) = (CSentry(1), CSexit(1), CSentry(2), ..., CSentry(6), CSexit(6)) ------------------------------------------ What is G_1? Why does it have 12 parameters? What is G_2? Why is G_3 = CS_2 in this example? What's the ordering on the solution space? What does it mean for G to be monotone? 2. Galois connections ------------------------------------------ ABSTRACTION AND CONCRETIZATION abstraction function for AV analysis: a: Powerset(Trace) -> Powerset(Var) a(X) = { x | x \in tr, tr \in X } concretization function for AV analysis: g: Powerset(Var) -> Powerset(Trace) g(Y) = { tr | (\forall x \in Y :: x \in tr))} Adjunction, or Galois connection: a(X) \subseteq Y <==> X \subseteq g(Y) ------------------------------------------ Would a and g be different for the RD analysis? ------------------------------------------ set of traces set of vars |---------------| |---------------| | | g | | | g(Y) <---------------- Y | | U| | | U| | | X -----------------> a(X) | | | a | | |_______________| |_______________| ------------------------------------------ 3. calculating the analysis Why do we care about the abstraction and concretization functions? ------------------------------------------ CALCULATING THE ANALYSIS Extend a and g pointwise to tuples: a(X_1, ..., X_12) = (a(X_1), ..., a(X_12)) g(Y_1, ..., Y_12) = (g(Y_1), ..., g(Y_12)) Define the AV analysis by the function a o G o g: Powerset(Var)^12 -> Powerset(Var)^12 so for each i in {1..12} (a o G_i o g): Powerset(Var)^12 -> Powerset(Var) by a(G_1(g(AV_1, ..., AV_12))) = a({()}) = {} a(G_2(g(AV_1, ..., AV_12))) = a({tr : q | tr \in g(AV_1)}) So a solution (AVentry(1), ..., AVexit(6)) has the property that (AVentry(1), ..., AVexit(6)) = (a o G o g)(AVentry(1), ..., AVexit(6)) ------------------------------------------ What's AV_3? How does this compare to the analysis we created by hand? What's the benefit of doing things this way? VI. Type and Effect Systems (1.6) A. goals B. idea What's the basic idea? C. annotated type system example 1. annotated base types ------------------------------------------ ANNOTATED TYPE SYSTEM EXAMPLE [ass] [x := a]^l : AV1 -> AV2 if AV2 = AV1 \cup {x} [skip] [skip]^l : AV -> AV S1 : AV1 -> AV2, S2: AV2 -> AV3 [seq] -------------------------------- S1; S2 : AV1 -> AV3 S1 : AV1 -> AV2, S2: AV1 -> AV2 [if] -------------------------------- if [b]^l then S1 else S2 : AV1 -> AV2 S : AV1 -> AV1 [wh] -------------------------------- while [b]^l then S : AV1 -> AV1 S : AV2 -> AV3 [sub] -------------- if AV1 \subseteq AV2, S : AV1 -> AV4 AV3 \subseteq AV4 ------------------------------------------ What's the meaning of the basic types? What's the meaning of the arrow types, like AV1 -> AV4? How would you explain the assignment rule? How would you explain the if rule? Why don't we have to check the type of the condition in an if? How would you explain the while rule? How would you explain the sub rule? a. type checking ------------------------------------------ EXAMPLE [q := 0]^1; [r := x]^2; while [r >= y]^3 do ([r := r-y]^4; [q := q+1]^5); assert [0<=r and r AV2, [ass] [r := x]^2: AV2 -> AV3 [ass] ___________________________________ [seq] ([q := 0]^1;[r := x]^2) : AV1 -> AV3 if AV3 = AV2 \cup {r}, AV2 = AV1 \cup {q} [r := r-y]^4: AV4 -> AV4 \cup {r}, [ass] [q := q+1]^5: AV5 -> AV5 \cup {q} [ass] ___________________________________ [seq] ([r := r-y]^4;[q := q+1]^5) : AV4 -> AV6 if AV6 = AV5 \cup {q} AV5 = AV4 \cup {r} ___________________________________[sub] ([r := r-y]^4;[q := q+1]^5) : AV6 -> AV6 if AV6 \subseteq AV4 ___________________________________[wh] while [r >= y]^3 do ([r := r-y]^4;[q := q+1]^5) : AV6 -> AV6 so by the seq rule, ------------------------------------------ So what are all the constraints? ------------------------------------------ CONSTRAINTS AV2 = AV1 \cup {q} AV3 = AV2 \cup {r}, AV5 = AV4 \cup {r} AV6 = AV5 \cup {q} AV6 \subseteq AV4 AV3 = AV6 AV6 = AV7 ------------------------------------------ So, what's a solution? 2. annotated type constructors ------------------------------------------ EXAMPLE Judgments: XMust S : Sigma -------------> Sigma YMay Where XMust and YMay are sets of variables. {x} [ass] [x := a]^l : Sigma ----> Sigma {x} {} [skip] [skip]^l : Sigma -----> Sigma {} X1 S1 : Sigma -----> Sigma, Y1 X2 S2 : Sigma -----> Sigma Y2 [seq] -------------------------------- X3 S1; S2 : Sigma ----> Sigma Y3 if X3 = X1 \cup X2, Y3 = Y1 \cup Y2 X1 S1 : Sigma -----> Sigma, Y1 X2 S2 : Sigma -----> Sigma Y2 [if] -------------------------------- if [b]^l then S1 else S2 X3 : Sigma ----> Sigma Y3 if X3 = X1 \cap X2, Y3 = Y1 \cup Y2 X S : Sigma -----> Sigma Y [wh] -------------------------------- while [b]^l then S {} : Sigma ----> Sigma Y X S : Sigma ----> Sigma Y [sub] ---------------------- X' S : Sigma -----> Sigma Y' if X' \subseteq X, Y \subseteq Y' ------------------------------------------ Why does the assignment rule make sense? How do you explain the if rule? How do you explain the while rule? How would you deal with assert statements? How would you deal with a try statement? ------------------------------------------ TYPE CHECKING EXAMPLE TYPE CHECKING Idea: accumulate constraints. {q} [q := 0]^1: Sigma ---> Sigma , [ass] {q} {r} [r := x]^2: Sigma ---> Sigma [ass] {r} _________________________________ [seq] ([q := 0]^1;[r := x]^2) {q,r} : Sigma ----> Sigma {q,r} {r} [r := r-y]^4: Sigma ---> Sigma, [ass] {r} {q} [q := q+1]^5: Sigma ---> Sigma [ass] {q} _________________________________ [seq] ([r := r-y]^4;[q := q+1]^5) {q,r} : Sigma ----> Sigma {q,r} ___________________________________[wh] while [r >= y]^3 do ([r := r-y]^4;[q := q+1]^5) {} : Sigma ----> Sigma {q,r} so by the seq rule, ------------------------------------------ How is this different than with annotated base types? D. effect systems 1. example: call tracking analysis ------------------------------------------ EXAMPLE Judgments: Gamma |- e : t & phi where Gamma : Var -> Type e : Expression t : Type phi : Effect phi Type = int | bool | t1 ---> t2 phi : Powerset(FunName) [var] Gamma |- x : t & {}, if Gamma(x) = t Gamma[x |-> tx] |- e : t & phi [fn] -------------------------------- Gamma |- fn_pi x => e phi2 : tx ------> t & {} if phi2 = phi \cup {pi} phi Gamma |- e1 : t2 ---> t & phi1, Gamma |- e2 : t2 & phi2 [app] -------------------------------- Gamma |- e1 e2 : t & phi3 if phi3 = phi1 \cup phi2 \cup phi ------------------------------------------ What do the judgments mean? What do the arrow types mean? How would you explain the fn rule? How would you explain the app rule?