COP 5021 Lecture -*-Outline-*- * Type Checking Notation and Techniques see also section 7.4 of Watt's Concepts and Paradigms ------------------------------------------ TYPE CHECKING VS. TYPE INFERENCE Type Checking: Some declarations built-in (+, *, ...) User declares types (of names) Compiler infers types (of expressions) Compiler compares each inferred use to declared - with == in non-OO language - with <= in OO language (subtyping) Type Inference Some declarations built-in (+, *, ...) Compiler infers types (of expressions) Compiler compares all inferred uses - with == in non-OO language - with <= in OO language (subtyping) ------------------------------------------ ** Type checking Important notations coming here... ------------------------------------------ TYPE CHECKING NOTATION: JUDGEMENTS G |- x : T means given/assuming type envrionment G, one can prove that x has type T TYPE CHECKING NOTATION: AXIOMS AND INFERENCE RULES [var] G |- x : T if G(x) = T G |- E : T [asgn] ------------------- if G(x) = T G |- x := E : ok ------------------------------------------ G is conventionally written with a Greek letter (\Gamma). Q: Is the type environment G an inherited or synthesized attribute? inherited Q: Would the type T be inherited or synthesized? synthesized The [var] rule is an axiom (scheme), for all variables x, and types T, and type envrionments G. The [asgn] rule means to show that the judgement under the line checks, have to prove that the judgement on top, and the side conditions Point out the importance of name coincidences in a rule. *** type environments (Watt's PL Syntax and Semantics, section 3.3.1) Q: what if an expression E involves a variable x? How to keep track of this information to use? introduce the notion of a type environment map from names to types that embodies the assmumptions a proof can make. ------------------------------------------ ENVIRONMENTS G \in TypeEnv = Identifier -> Type {} is the empty TypeEnv x:T,y:U is the mapping {x |-> T, y |-> U} Adding: If G = x:T then G,y:U is x:T,y:U Overlay: G(+)H is such that (G(+)H)(x) = H(x), if x in dom(H) G(x) if x not in dom(H) ------------------------------------------ *** example type checking rules for WHILE ------------------------------------------ TYPE CHECKING FOR THE WHILE LANGUAGE ::= int | bool | ok define Programs: x1:int,...,xn:int |- S : ok [prog] -------------------------------- |- proc (x1,...,xn) is { S } : ok if n >= 0 and x1,...,xn are distinct Statements: G |- E : T [asgn] ------------------- if G(x) = T G |- x := E : ok [skip] G |- skip : ok [seq] -------------------------- G |- S1; S2: ok [if] ------------------------- G |- if E then S1 else S2 : ok G |- E : bool, G |- S: ok [wh] -------------------------- while E do S : ok Expressions: [var] G |- x : T if G(x) = T [num] G |- n : int [aexp] ----------------------------- G |- E1 op_a E2 : int [true] G |- true: bool [false] G |- false: bool G |- E : bool [not] --------------------- G |- not E : bool G |- E1 : bool, G |- E2 : bool [bexp] ----------------------------- G |- E1 op_b E2 : bool [rexp] ----------------------------- G |- E1 op_r E2 : bool ------------------------------------------ Q: What hypotheses are needed for the [seq] rule? ... for [seq], hypotheses: G |- S1 : ok, G :- S2 : ok Q: What hypotheses are needed for the [if] rule? ... for [if], hypotheses: G |- E : bool, G |- S1: ok, G |- S2: ok Q: Why don't we need two rules for if statements? Q: What hypotheses are needed for the [aexp] rule? ... for [aexp], hypotheses: G |- E1 : int, G |- E2 : int Q: What hypotheses are needed for the [rexp] rule? ... for [rexp], hypotheses: G |- E1 : int, G |- E2 : int **** variations Q: What if we don't assume that all variables have type int in a program? Could either infer them... Q: What if we had type declarations for variables? That would determine the initial type environment. ------------------------------------------ OPERATOR TYPES G |- E1: S, G |- E2: S, G |- op : S x S -> T [binexp] ------------------- G |- E1 op E2 : T Either have axioms for various operators: G |- + : int x int -> int G |- and : bool x bool -> bool G |- < : int x int -> bool Or put them in the initial environment for the program: G_FV |- S : ok [prog] ----------------- |- { S } : ok if x \in FV(S) ==> G_FV(x) = int and G(+) = int x int -> int, ... ------------------------------------------ Q: When would you prefer putting operators in the environment? When operators can also be user-defined, either: or overridden (as in C++) or statically-scoped (as in Haskell) *** example, type checking rules for SML-like expressions Note, we already don't declare the type of the function's body... ------------------------------------------ TYPE CHECKING FOR SML SUBSET Abstract Syntax: E ::= I | E1(E2) | if E1 then E2 else E3 | fn I:T1 => E | let D in E2 | (E1,E2) D ::= val I:T = E | rec I:T = E | D1; D2 T ::= int | bool | T1 -> T2 | T1 * T2 | TV | T1 list | all TV . T1 TV ::= 'a | 'b | 'c | ... Expressions (E): [var] G |- I: T if [app] ---------------------- G |- E1(E2): T [if] ------------------------------- G |- (if E1 then E2 else E3): T [fn] --------------------------- if G' G |- (fn I:T1 => E): T1->T2 [fnp] --------------------------- if G' G |- (fn (I1,I2): ) => E) : [pair] --------------------- G |- (E1,E2) : G |- D ==> G', G'' |- E2 : [let] --------------------- if G'' = G |- (let D in E2): T Declarations (D): [val] ----------------------- if G' = I:T G |- val I:T = E ==> G' [rec] ----------------------- if G' = I:T G |- rec I:T = E1 ==> G' [Dseq] ---------------------- G |- D1; D2 ==> ------------------------------------------ Q: What should the hypotheses be for the [app] rule? ... G |- E1: T1 -> T, G |- E2: T1 Q: What should the hypotheses be for the [if] rule? ... G |- E1: bool, G |- E2: T, G |- E3: T Q: What should the hypotheses be for the [fn] rule? ... G,I:T1 |- E: T2 Q: what should be the rule for functions with formals that are pairs? G' |- E:T3 [fnp] ---------------------------------------- if G' = G,I1:T1,I2:T2 G |- (fn (I1,I2):T1*T2 => E) : T1*T2 -> T3 and T3 = T1 * T2 and I1 != I2 Q: what should be the rule for pairs? G |- E1:T1, G |- E2:T2 [pair] --------------------- G |- (E1,E2): T1 * T2 Q: do we need a rule for applications where the arguments are pairs? yes... Q: do we really need the [pair] rule as a built-in rule? no, can think of it as sugar for a function appliction Key idea: expressions are checked from the bottom up (on desugared syntax tree) (data driven recursion = structural induction) Note how the rule [app] is like modus-ponens (Curry-Howard isomorphism) Note the idea of "delta" or "mini-environments" in the val and rec rules. Q: What should the hypothesis be for the [val] rule? for the [val] rule: ... G |- E:T (note: In ML, rec is really "val rec", but I'm saving space here...) Q: What should the hypothesis be for the [rec] rule? for the [rec] rule: ... G(+)G' |- E1:T Q: What should the hypothesis be for the [Dseq] rule? for the [Dseq] rule: ... G |- D1 ==> G1, G(+)G1 |- D2 ==> G2 Q: What should the conclusion be for the [Dseq] rule? ... G1(+)G2 (note these are just the delta, mini-environments) Q: What should G'' be in the side condition of the [let] rule? ... G'' = G (+) G' Q: What should the type of E2 be in the [let] rule's 2nd hypothesis? ... T Q: What about simultaneous binding? mutual recursion (and)? G |- D1 ==> G1, G |- D2 ==> G2 [Dsim] ---------------------------- if G3 = G1 \uplus G2 G |- D1 & D2 ==> G3 and dom(G1) \cap dom(G2) = {} G3 |- D1 ==> G1, G3 |- D2 ==> G2 [Drec] ---------------------------- if G3 = G1 \uplus G2 G |- D1 and D2 ==> G3 and dom(G1) \cap dom(G2) = {} But the pattern matching syntax is important for getting bindings to all the parts of the formal. Generalization of the [fnp] rule for datatypes (omit) T ::= datatype c2(x) of T2 | c3(y) of T3 G |- E1 : datatype c2(x) of T2 | c3(y) of T3, G2 |- E2:T, G3 |- E3:T if G2 = G,x:T2 [case] --------------------------------------- and G3 = G,y:T3 G |- (case E1 of c2(x) => E2 | c3(y) => E3): T *** proofs ------------------------------------------ PROOFS OF TYPE CHECKING [var] Gfxy |- x: 'a, Gfxy |- y :'b [pair] ------------------------------ [var] Gfxy |- f :'a*'b->'c, Gfxy |- (x,y) : 'a*'b [app] ---------------------------- Gfxy |- f(x,y) : 'c [fn] ---------------------------- Gxy |- (fn f:'a*'b->'c => f(x,y)) : ('a*'b->'c) -> 'c [fnp] ---------------------------- ETE |- (fn (x,y):'a*'b => (fn f:'a*'b->'c => f(x,y))) : 'a*'b -> (('a*'b->'c) -> 'c) where ETE = {} and Gxy = x:'a, y:'b and Gfxy = f: 'a*'b->'c, x:'a, y:'b ------------------------------------------ Q: can you prove G |- plus(x,3):int where G = 3:int, plus: int*int -> int, x: int ? ** Reconstructing (Inferring) type declarations for monomorphic expressions Basic ideas due to Hindley and Milner. Made possible by unification See also the Cardelli paper in the references (Basic Polymorphic Typechecking) *** example ------------------------------------------ TYPE RECONSTRUCTION EXAMPLE [var] Gfxy |- x: , Gfxy |- y : [pair] ------------------------------ [var] Gfxy |- f : Gfxy |- (x,y) : [app] ---------------------------- Gfxy |- f(x,y) : [fn] ---------------------------- Gxy |- (fn f => f(x,y)) : [fnp] ---------------------------- ETE |- (fn (x,y) => (fn f => f(x,y))) : where ETE = {} and Gxy = x:'a, y:'b} and Gfxy = f: x:'a, y:'b CONSTRAINTS: ------------------------------------------ Write down constraints on types as equations *** inference rules without considering polymorphism ------------------------------------------ TYPE RECONSTRUCTION (INFERENCE) PART 1 [var] G |- I: T if G(I) = T G |- E1: S -> T, G |- E2: S [app] --------------------------- G |- E1(E2): T G |- E1:bool, G |- E2:T, G |- E3:T [if] ---------------------------------- G |- (if E1 then E2 else E3): T [fn] ------------------------ if G' = G |- (fn I => E): T1->T2 G |- D ==> G', G'' |- E : T [let] -------------------- if G'' = G(+)G' G |- (let D in E): T ------------------------------------------ Q: What would be the hypothesis for the [fn] rule? ... G' |- E:T2, if G' = G,I:T *** polymorphism (skip) ------------------------------------------ WHEN IS A FUNCTION POLYMORPHIC? Is this a type error? (fn f => g(f(2), f(true))) What if f is: val id = (fn x => x) val succ = (fn x => x + 1) ------------------------------------------ In ML, the solution is that after you bind with val (or val rec), mark type variables as generic (as in id) thus within a function declaration, the argument type is non-generic but after the declaration it becomes generic ------------------------------------------ DON'T ALLOW CAPTURING let val g = (fn x => let val f = (fn y => x) in if f(3) then f(true) else x + 5 end) in ... end ------------------------------------------ the problem with the above would be if we concluded that f had type all 'a . all 'b . ('a -> 'b) instead of all 'a . ('a -> int) so can't mark a type variable as generic if it's being used in the current type environment, logically, don't want the all quantifier to capture 'b. formally: a type variable occuring in the type of an expression E is generic for a given scope if it does not occur in the type of a lambda-variable declaration that encloses the given scope notation: in the type all 'a . 'a -> 'b 'a is a generic type The notation "all T . ..." is read "for all types T, ..." non-generic type variables: type variables appearing in type of a lambda-bound identifier -shared among all occurrences of lambda-bound id (e.g., f) -prevents heterogenous use of lambda-bound identifiers generic type variables: when used, replace by a normal type variable in practice, each instance of all 'a . 'a -> 'b is of the form 'c -> 'b where 'c is a fresh (not used elsewhere) non-generic type var ------------------------------------------ GENERIC TYPE VARIABLES in the type: all 'a . 'a -> 'b 'a is generic, and 'b is not [var] G |- id: all 'a . 'a -> 'a [gElim] -------------------------------- G |- id: 'b -> 'b, G |- id: 'c -> 'c, G |- 3: int, G |- true : bool [app] ------------------------------------ G |- id(3):int, G |- id(true): bool [pair] ----------------------------------- G |- (id(3), id(true)) : int * bool where G = {id |-> (all 'a . 'a -> 'a), 3 |-> int, true |-> bool} CONSTRAINTS: 'b = int 'c = bool ------------------------------------------ ------------------------------------------ TYPE RECONSTRUCTION (INFERENCE) GENERICS and BINDINGS G |- E: T [val] ____________________ G |- (val I = E) ==> G' if G' = I:gen(T,G) G'' |- E1: T [rec] ___________________ G |- rec f = E1 ==> G' if G' = I:gen(T,G) where gen(T, G) = all V1 . ... . all Vk . T if V1,...,Vk are all the free type variables in T that are not in range(G). G |- I : all V . T [gElim] __________________ G |- I : T' if T' = [T''/V]T and T'' is fresh ------------------------------------------ The notation [T''/V]T means substitution of T'' for free occurrences of V in T, Here T'' is arbitrary, but must be fresh *** polymorphic rules (skip) ------------------------------------------ EXAMPLE G |- rec map = (fn f => (fn ls => if (null ls) then [] else (f (hd ls))::(map f (tl ls)))) ==> where G is (op ::): all 'a . 'a * 'a list -> 'a list, [] : all 'a . 'a list, null : all 'a . 'a list -> bool, hd : all 'a . 'a list -> 'a, tl : all 'a . 'a list -> 'a list ------------------------------------------ [lemma1] G' |- (null ls): bool, [lemma2] G' |- []: 'h list , [lemma3] G' |- (f (hd ls)) ::(map f (tl ls)) : 'h list [if] ___________________________________ G' |- if (null ls) then [] else (f (hd ls)) ::(map f (tl ls)))) : 'h list [fn] ___________________________________ G, map:'b, f:'c |- (fn ls => if (null ls) then [] else (f (hd ls)) ::(map f (tl ls)))) : 'g list -> 'h list [fn] ____________________________________ G, map: 'b |- (fn f => (fn ls => if (null ls) then [] else (f (hd ls)) ::(map f (tl ls)))) : ('g -> 'h) -> 'g list -> 'h list [rec] ____________________________________ G |- rec map = (fn f => (fn ls => if (null ls) then [] else (f (hd ls)) ::(map f (tl ls)))) ==> G[map |-> ('g -> 'h) -> 'g list -> 'h list] where G' = G, map: 'b, f : 'c, ls : 'd constraints: 'b = ('g -> 'h) -> ('g list -> 'h list), 'c = 'g -> 'h, 'd = 'g list Lemma 1: if ls has some type, then G' |- (null ls): bool Proof: [var] G' |- null : all 'a . 'a list -> bool [gElim] __________________________ G' |- null : 'g list -> bool, [var] G' |- ls : 'd [app] __________________________________ G' |- (null ls): bool constraints: 'd = 'g list QED Have them do lemma 2 (lemma 3 takes too long). Key ideas: think of the checks to be made as either succeeding, or constraining the result (by unification) accumulate a set of constraints types that are unconstrained at the end turn into universally quantified (all 'a . TE('a)) types. names bound with let (val or rec) are polymorphic, but monomorphic instances are used in the program. (see section 7.3 of Watt's Concepts and Paradigms book) Formalization: system of typings and equations, solve equations for unknowns ** Limitations *** No recursive types: type 'a stream = unit -> ('a * 'a stream) no way to use this in the algorithm solution: use abstraction to break the recursion (think of above equation as an isomorhpism) up, down are provided implicitly by the datatype constructor (up when applied to values down when used in pattern matching) *** Polymorphic functions are not first-class objects! type variables in a function parameter are not generic ------------------------------------------ NO POLYMORPHIC FUNCTIONS AS ARGUMENTS - fun F f = fn (a,b) => (f(a), f(b)); val F = fn : ('a -> 'b) -> 'a * 'a -> 'b * 'b - val W = (fn x => (x x)); std_in:1.18-1.20 Error: operator is not a function operator: 'Z in expression: (x x) ------------------------------------------ All arguments to functions in ML have one type (which is the correct instance if possible). so no self-application, thus no Y combinator another example of this... - fun S f g x = ((f x) (g x)); val S = fn : ('a -> 'b -> 'c) -> ('a -> 'b) -> 'a -> 'c - fun I x = x; val I = fn : 'a -> 'a - S I I; std_in:8.1-8.5 Error: operator and operand don't agree (circularity) operator domain: ('Z -> 'Y) -> 'Z operand: ('Z -> 'Y) -> 'Z -> 'Y in expression: S I I