COP 5021 meeting -*- Outline -*- * Type and Effect Systems (1.6) ** setting ------------------------------------------ SETTING Type checking is usually syntax-directed and compositional So can be implemented by: ------------------------------------------ ... a recursive walk over the AST (of a program) e.g., each expression has a type, and the type is a function of the types of its subexpressions ** goals To fit program analysis into the syntax-directed machinery of type checking. To extend type systems with information about effects (or other semantic information). ** idea Q: What's the basic idea? ------------------------------------------ TYPE AND EFFECT SYSTEMS (1.6) Basic idea? ------------------------------------------ ... Use types to express the analysis. Ignores control flow (conventionally) Two standard techniques: - annotated type systems - effect systems ** annotated type system example *** type notation ------------------------------------------ NATURAL DEDUCTION STYLE NOTATION Type systems are written with rules of the form G |- f : T1 -> T, G |- e : T1 [e-rule] _________________ if C G |- f(e1) : T [int c] |- 0 : Int ------------------------------------------ ... explain that: "[e-rule]" and "[int c]" are rule names, the horizontal line separates hypotheses (above) from the conclusion(s) below all (meta-)variable names used (like G, f, e, T1, T, C) are universally quantified over the entire rule ":" means "has type" "|-" (\vdash) means [the context to the left] proves the conclusion to the right "G |- e : T" is called a judgment Other symbols are typically in the grammar of the language or the type system. G (\Gamma) is a type environment usually, which is a finite mapping from variable names to types e is an expression, T (often \tau) is a type C is a condition (written in mathematical notation), The rule can be used in a proof (only) when C holds Axioms, like [int c] can also be written just as conclusions, with no hypotheses, but are often just written like this Conventionally, the rules only state what can be proven failure to find a proof means that the program is ill-typed We say that the proof gets "stuck" on type errors This assumes that there is only one way to do a proof (the system is deterministic or at least Church-Rosser, meaning it always comes to the same conclusion) *** annotated type system example For taint analysis, we seek sets of variables that may have be tainted up to a program point. ------------------------------------------ ANNOTATED TYPE SYSTEM FOR TAINT ANALYSIS Types are sets of variables that may be tainted [asg] [x := a]^l : T1 -> T2 if T2 = (T1 - {x}) \cup {x | FV(a) \cap T1 \neq {} } [skip] [skip]^l : T -> T S1 : T1 -> T2, S2: T2 -> T3 [seq] ____________________________ S1; S2 : T1 -> T3 S1 : T1 -> T2, S2: T1 -> T2 [if] ___________________________________ if [b]^l then S1 else S2 : T1 -> T2 S : T1 -> T1 [wh] _____________________________ while [b]^l then S : T1 -> T1 [re] [read x]^l : T1 -> T2 if T2 = T1 \cup {x} [sa] [sanitize x]^l : T1 -> T2 if T2 = T1 - {x} [pr] [print x]^l : T1 -> T1 if x \not\in T1 S : T2 -> T3 [sub] _____________ if T1 \subseteq T2, S : T1 -> T4 T3 \subseteq T4 ------------------------------------------ Q: What's the meaning of the basic types? set of variables that might be tainted Q: What's the meaning of the arrow types, like T1 -> T4? if the input flows to the statement have T1 as the set of possibly tainted variables, then at the end the statement taints no more than T4 (see the sub rule) Q: How would you explain the assignment rule? Q: How would you explain the if rule? Q: Why don't we have to check the type of the condition in an if? the grammar does that and it doesn't affect the tainting Q: How would you explain the while rule? Q: How would you explain the sub rule? subsumption, says the meaning of arrow types (as above) Note: the subsumption rule is correct as written! Q: What would we need to change if we wanted to do an information flow security analysis instead of a taint analysis? Use the tainted variable tested in conditions to taint each assignment in the body (or bodies) **** type checking ------------------------------------------ EXAMPLE [y := 0]^1; [print y]^2; [read x]^3; while [x < 0]^4 do ([y := y+1]^5; [print y]^6; [read x]^7); [z := x]^8 ------------------------------------------ ------------------------------------------ TYPE CHECKING Idea: accumulate constraints. [y := 0]^1: T1 -> T2 [asg] [print y]^2: T2 -> T3 [pr] ___________________________________ [seq] ([y := 0]^1;[print y]^2) : T1 -> T3 if T2 = T1-{y} and y \not\in T2 and T3 = T2 ------------------------------------------ ... ([y := 0]^1;[print y]^2): T1 -> T3 [seq] [read x]^3: T3 -> T4 [re] ___________________________________ [seq] ([y := 0]^1;[print y]^2; [read x]^3) : T1 -> T4 if T2 = T1-{y} and y \not\in T2 and T4 = T3 \cup {x} [y := y+1]^5 : T5 -> T6 [asg] [print y]^6 : T6 -> T7 [pr] ___________________________________ [seq] ([y := y+1]^5; [print y]^6) : T5 -> T7 if T6 = (T5 - {y}) \cup ({y} \cap T5) and T7 = T6 and y \not\in T6 ([y := y+1]^5;[print y]^6) : T5 -> T7 [seq] [read x]^7: T7 -> T8 [re] ___________________________________ [seq] ([y := y+1]^5; [print y]^6; [read x]^7) : T5 -> T8 if T6 = (T5 - {y}) \cup ({y} \cap T5) and T7 = T6 and y \not\in T6 and T8 = T7 \cup {x} if we try to apply the [wh] rule, we see that we have to have the same type (of form T -> T) for the body and the statement itself, but the body has type T5 -> T8, so we need to use the sub rule... ([y := y+1]^5; [print y]^6; [read x]^7): T5 -> T8 _________________________________________________ [sub] ([y := y+1]^5; [print y]^6; [read x]^7): T9 -> T9 if T9 \subseteq T5 and T8 \subseteq T9 these constraints mean T9 \subseteq T5 T6 = (T5 - {y}) \cup ({y} \cap T5) T7 = T6 and y \not\in T6 T8 = T7 \cup {x} T8 \subseteq T9 ([y := y+1]^5; [print y]^6; [read x]^7): T9 -> T9 [sub] ___________________________________ [wh] while [x < 0]^3 do ([y := y+1]^5; [print y]^6; [read x]^7) : T9 -> T9 ([y := 0]^1;[print y]^2; [read x]^3) : T1 -> T4 [seq] while [x < 0]^3 do ([y := y+1]^5; [print y]^6; [read x]^7) : T9 -> T9 [wh] _______________________________________________ [seq] ([y := 0]^1;[print y]^2; [read x]^3); while [x < 0]^3 do ([y := y+1]^5; [print y]^6; [read x]^7) : T1 -> T9 if T4 = T9 ([y := 0]^1;[print y]^2; [read x]^3); while [x < 0]^3 do ([y := y+1]^5; [print y]^6; [read x]^7) : T1 -> T9 [z := x]^8 : T9 -> T10 _______________________________________________ [seq] (([y := 0]^1;[print y]^2; [read x]^3); while [x < 0]^3 do ([y := y+1]^5; [print y]^6; [read x]^7); [z := x]^8) : T1 -> T10 if T10 = T9-{z} \cup ({x} \cap T9) Q: So what are all the constraints? ------------------------------------------ CONSTRAINTS T2 = T1-{y} y \not\in T2 T3 = T2 T4 = T3 \cup {x} T6 = (T5-{y}) \cup ({y} \cap T5) T7 = T6 y \not\in T6 T8 = T7 \cup {x} T9 \subseteq T5 T8 \subseteq T9 T4 = T9 T10 = T9-{z} \cup ({x} \cap T9) So, what's a solution? ------------------------------------------ ... If T1 = {}, we have T2 = {} (note: y \not\in T2 follows) T3 = {} T4 = {x} let's guess T5 = {x}, by analogy to data flow analysis T6 = {x} T7 = {x} T8 = {x} T9 = {x} Can solve constraints as previously. Can accumulate constraints in different orders. *** annotated type constructors Ideas: - express modification to analysis results (deltas) - annotate the type constructors instead of using analysis results as the types. - simultaneously do definitely/possibly (must/may) analyses ------------------------------------------ EXAMPLE Judgments: XMust S : Sigma -------------> Sigma YMay Where XMust and YMay are sets of variables (that S must assign and may assign). {x} [asg] [x := a]^l : Sigma ----> Sigma {x} {} [skip] [skip]^l : Sigma -----> Sigma {} X1 S1 : Sigma -----> Sigma, Y1 X2 S2 : Sigma -----> Sigma Y2 [seq] -------------------------------- X3 S1; S2 : Sigma ----> Sigma Y3 if X3 = X1 \cup X2, Y3 = Y1 \cup Y2 X1 S1 : Sigma -----> Sigma, Y1 X2 S2 : Sigma -----> Sigma Y2 [if] -------------------------------- if [b]^l then S1 else S2 X3 : Sigma ----> Sigma Y3 if X3 = X1 \cap X2, Y3 = Y1 \cup Y2 X S : Sigma -----> Sigma Y [wh] -------------------------------- while [b]^l then S {} : Sigma ----> Sigma Y X S : Sigma ----> Sigma Y [sub] ---------------------- X' S : Sigma -----> Sigma Y' if X' \subseteq X, Y \subseteq Y' ------------------------------------------ Q: Why does the assignment rule make sense? Q: How do you explain the if rule? Q: How do you explain the while rule? Q: How would you deal with assert statements? Q: How would you deal with a try statement? ------------------------------------------ TYPE CHECKING EXAMPLE TYPE CHECKING Idea: accumulate constraints. {q} [q := 0]^1: Sigma ---> Sigma , [asg] {q} {r} [r := x]^2: Sigma ---> Sigma [asg] {r} _________________________________ [seq] ([q := 0]^1;[r := x]^2) {q,r} : Sigma ----> Sigma {q,r} {r} [r := r-y]^4: Sigma ---> Sigma, [asg] {r} {q} [q := q+1]^5: Sigma ---> Sigma [asg] {q} _________________________________ [seq] ([r := r-y]^4;[q := q+1]^5) {q,r} : Sigma ----> Sigma {q,r} ___________________________________[wh] while [r >= y]^3 do ([r := r-y]^4;[q := q+1]^5) {} : Sigma ----> Sigma {q,r} so by the seq rule, ------------------------------------------ ([q := 0]^1;[r := x]^2); while [r >= y]^3 do ([r := r-y]^4;[q := q+1]^5) {q,r} : Sigma ----> Sigma {q,r} And with seq again (([q := 0]^1;[r := x]^2); while [r >= y]^3 do ([r := r-y]^4;[q := q+1]^5)); assert [0<=r and r Sigma {q,r} Q: How is this different than with annotated base types? It presents the changes (deltas) instead of the accumulations. This is "higher-order". ** effect systems These are similar to annotated base types, except that in higher-order languages, we have latent effects associated with functions. *** example: call tracking analysis we want to know what functions may be called by a given expression ------------------------------------------ EXAMPLE Judgments: Gamma |- e : t & phi where Gamma : Var -> Type e : Expression t : Type phi : Effect phi Type = int | bool | t1 ---> t2 phi : Powerset(FunName) [var] Gamma |- x : t & {}, if Gamma(x) = t Gamma[x |-> tx] |- e : t & phi [fn] -------------------------------- Gamma |- fn_pi x => e phi2 : tx ------> t & {} if phi2 = phi \cup {pi} phi Gamma |- e1 : t2 ---> t & phi1, Gamma |- e2 : t2 & phi2 [app] -------------------------------- Gamma |- e1 e2 : t & phi3 if phi3 = phi1 \cup phi2 \cup phi ------------------------------------------ Q: What do the judgments mean? Q: What do the arrow types mean? Q: How would you explain the fn rule? Q: How would you explain the app rule? One advantage of this is that "effect systems are often implemented as extensions of type inference algorithms."