COP 5021 Lecture -*-Outline-*-
* Type Checking Notation and Techniques
see also section 7.4 of Watt's Concepts and Paradigms
------------------------------------------
TYPE CHECKING VS. TYPE INFERENCE
Type Checking:
Some declarations built-in (+, *, ...)
User declares types (of names)
Compiler infers types (of expressions)
Compiler compares
each inferred use to declared
- with == in non-OO language
- with <= in OO language (subtyping)
Type Inference
Some declarations built-in (+, *, ...)
Compiler infers types (of expressions)
Compiler compares
all inferred uses
- with == in non-OO language
- with <= in OO language (subtyping)
------------------------------------------
** Type checking
Important notations coming here...
------------------------------------------
TYPE CHECKING NOTATION: JUDGEMENTS
G |- x : T
means given/assuming type envrionment G,
one can prove that x has type T
TYPE CHECKING NOTATION:
AXIOMS AND INFERENCE RULES
[var] G |- x : T if G(x) = T
G |- E : T
[asgn] ------------------- if G(x) = T
G |- x := E : ok
------------------------------------------
G is conventionally written with a Greek letter (Gamma).
Q: Is the type environment G an inherited or synthesized attribute?
inherited
Q: Would the type T be inherited or synthesized?
synthesized
The [var] rule is an axiom (scheme), for all variables x, and
types T, and type envrionments G.
The [asgn] rule means to show that the judgement under the line checks,
have to prove that the judgement on top,
and the side conditions
Point out the importance of name coincidences in a rule.
*** type environments (Watt's PL Syntax and Semantics, section 3.3.1)
Q: what if an expression E involves a variable x?
How to keep track of this information to use?
introduce the notion of a type environment
map from names to types
that embodies the assmumptions a proof can make.
------------------------------------------
ENVIRONMENTS
G \in TypeEnv = Identifier -> Type
{} is the empty TypeEnv
x:T,y:U is the mapping {x |-> T, y |-> U}
Adding: If G = x:T then
G,y:U is x:T,y:U
Overlay: G(+)H is such that
(G(+)H)(x) = H(x), if x in dom(H)
G(x) if x not in dom(H)
------------------------------------------
*** example type checking rules for WHILE
------------------------------------------
TYPE CHECKING FOR THE WHILE LANGUAGE
::= int | bool | ok
define
Programs:
x1:int,...,xn:int |- S : ok
[prog] --------------------------------
|- proc (x1,...,xn) is { S } : ok
if n >= 0
and x1,...,xn are distinct
Statements:
G |- E : T
[asgn] ------------------- if G(x) = T
G |- x := E : ok
[skip] G |- skip : ok
[seq] --------------------------
G |- S1; S2: ok
[if] -------------------------
G |- if E then S1 else S2 : ok
G |- E : bool, G |- S: ok
[wh] --------------------------
while E do S : ok
Expressions:
[var] G |- x : T if G(x) = T
[num] G |- n : int
[aexp] -----------------------------
G |- E1 op_a E2 : int
[true] G |- true: bool
[false] G |- false: bool
G |- E : bool
[not] ---------------------
G |- not E : bool
G |- E1 : bool, G |- E2 : bool
[bexp] -----------------------------
G |- E1 op_b E2 : bool
[rexp] -----------------------------
G |- E1 op_r E2 : bool
------------------------------------------
Q: What hypotheses are needed for the [seq] rule?
... for [seq], hypotheses: G |- S1 : ok, G :- S2 : ok
Q: What hypotheses are needed for the [if] rule?
... for [if], hypotheses: G |- E : bool,
G |- S1: ok, G |- S2: ok
Q: Why don't we need two rules for if statements?
Q: What hypotheses are needed for the [aexp] rule?
... for [aexp], hypotheses: G |- E1 : int, G |- E2 : int
Q: What hypotheses are needed for the [rexp] rule?
... for [rexp], hypotheses: G |- E1 : int, G |- E2 : int
**** variations
Q: What if we don't assume that all variables have type int in a program?
Could either infer them...
Q: What if we had type declarations for variables?
That would determine the initial type environment.
------------------------------------------
OPERATOR TYPES
G |- E1: S, G |- E2: S,
G |- op : S x S -> T
[binexp] -------------------
G |- E1 op E2 : T
Either have axioms for various operators:
G |- + : int x int -> int
G |- and : bool x bool -> bool
G |- < : int x int -> bool
Or put them in the initial environment
for the program:
G_FV |- S : ok
[prog] ----------------- if x \in FV(S) ==> G_FV(x) = int
|- { S } : ok and G(+) = int x int -> int, ...
------------------------------------------
Q: When would you prefer putting operators in the environment?
When operators can also be user-defined, either:
or overridden (as in C++) or
statically-scoped (as in Haskell)
*** example, type checking rules for SML-like expressions
Note, we already don't declare the type of the function's body...
------------------------------------------
TYPE CHECKING FOR SML SUBSET
Abstract Syntax:
E ::= I | E1(E2) | if E1 then E2 else E3
| fn I:T1 => E | let D in E2 | (E1,E2)
D ::= val I:T = E | rec I:T = E | D1; D2
T ::= int | bool | T1 -> T2 | T1 * T2
| TV | T1 list | all TV . T1
TV ::= 'a | 'b | 'c | ...
Expressions (E):
[var] G |- I: T if
[app] ----------------------
G |- E1(E2): T
[if] -------------------------------
G |- (if E1 then E2 else E3): T
[fn] --------------------------- if G'
G |- (fn I:T1 => E): T1->T2
[fnp] --------------------------- if G'
G |- (fn (I1,I2): ) => E) :
[pair] ---------------------
G |- (E1,E2) :
G |- D ==> G',
G'' |- E2 :
[let] --------------------- if G'' =
G |- (let D in E2): T
Declarations (D):
[val] ----------------------- if G' = I:T
G |- val I:T = E ==> G'
[rec] ----------------------- if G' = I:T
G |- rec I:T = E1 ==> G'
[Dseq] ----------------------
G |- D1; D2 ==>
------------------------------------------
Q: What should they hypotheses be for the [app] rule?
... G |- E1: T1 -> T, G |- E2: T1
Q: What should they hypotheses be for the [if] rule?
... G |- E1: bool, G |- E2: T, G |- E3: T
Q: What should they hypotheses be for the [fn] rule?
... G,I:T1 |- E: T2
Q: what should be the rule for functions with formals that are pairs?
G' |- E:T3
[fnp] ---------------------------------------- if G' = G,I1:T1,I2:T2
G |- (fn (I1,I2):T1*T2 => E) : T1*T2 -> T3 and T3 = T1 * T2
and I1 != I2
Q: what should be the rule for pairs?
G |- E1:T1, G |- E2:T2
[pair] ---------------------
G |- (E1,E2): T1 * T2
Q: do we need a rule for applications where the arguments are pairs?
yes...
Q: do we really need the [pair] rule as a built-in rule?
no, can think of it as sugar for a function appliction
Key idea: expressions are checked from the bottom up
(on desugared syntax tree)
(data driven recursion = structural induction)
Note how the rule [app] is like modus-ponens (Curry-Howard isomorphism)
Note the idea of "delta" or "mini-environments"
in the val and rec rules.
Q: What should the hypothesis be for the [val] rule?
for the [val] rule:
... G |- E:T
(note: In ML, rec is really "val rec", but I'm saving space here...)
Q: What should the hypothesis be for the [rec] rule?
for the [rec] rule:
... G(+)G' |- E1:T
Q: What should the hypothesis be for the [Dseq] rule?
for the [Dseq] rule:
... G |- D1 ==> G1, G(+)G1 |- D2 ==> G2
Q: What should the conclusion be for the [Dseq] rule?
... G1(+)G2 (note these are just the delta, mini-environments)
Q: What should G'' be in the side condition of the [let] rule?
... G'' = G (+) G'
Q: What should the type of E2 be in the [let] rule's 2nd hypothesis?
... T
Q: What about simultaneous binding? mutual recursion (and)?
G |- D1 ==> G1, G |- D2 ==> G2
[Dsim] ---------------------------- if G3 = G1 \uplus G2
G |- D1 & D2 ==> G3 and dom(G1) \cap dom(G2) = {}
G3 |- D1 ==> G1, G3 |- D2 ==> G2
[Drec] ---------------------------- if G3 = G1 \uplus G2
G |- D1 and D2 ==> G3 and dom(G1) \cap dom(G2) = {}
But the pattern matching syntax is important for getting bindings
to all the parts of the formal.
Generalization of the [fnp] rule for datatypes (omit)
T ::= datatype c2(x) of T2 | c3(y) of T3
G |- E1 : datatype c2(x) of T2 | c3(y) of T3,
G2 |- E2:T,
G3 |- E3:T if G2 = G,x:T2
[case] --------------------------------------- and G3 = G,y:T3
G |- (case E1 of c2(x) => E2 | c3(y) => E3): T
*** proofs
------------------------------------------
PROOFS OF TYPE CHECKING
[var] Gfxy |- x: 'a, Gfxy |- y :'b
[pair] ------------------------------
[var] Gfxy |- f :'a*'b->'c,
Gfxy |- (x,y) : 'a*'b
[app] ----------------------------
Gfxy |- f(x,y) : 'c
[fn] ----------------------------
Gxy |- (fn f:'a*'b->'c => f(x,y))
: ('a*'b->'c) -> 'c
[fnp] ----------------------------
ETE |- (fn (x,y):'a*'b =>
(fn f:'a*'b->'c => f(x,y)))
: 'a*'b -> (('a*'b->'c) -> 'c)
where ETE = {}
and Gxy = x:'a, y:'b
and Gfxy = f: 'a*'b->'c, x:'a, y:'b
------------------------------------------
Q: can you prove
G |- plus(x,3):int
where G = 3:int, plus: int*int -> int, x: int ?
** Reconstructing (Inferring) type declarations for monomorphic expressions
Basic ideas due to Hindley and Milner.
Made possible by unification
See also the Cardelli paper in the references
(Basic Polymorphic Typechecking)
*** example
------------------------------------------
TYPE RECONSTRUCTION EXAMPLE
[var] Gfxy |- x: , Gfxy |- y :
[pair] ------------------------------
[var] Gfxy |- f :
Gfxy |- (x,y) :
[app] ----------------------------
Gfxy |- f(x,y) :
[fn] ----------------------------
Gxy |- (fn f => f(x,y))
:
[fnp] ----------------------------
ETE |- (fn (x,y) => (fn f => f(x,y)))
:
where ETE = {}
and Gxy = x:'a, y:'b}
and Gfxy = f:
x:'a, y:'b
CONSTRAINTS:
------------------------------------------
Write down constraints on types as equations
*** inference rules without considering polymorphism
------------------------------------------
TYPE RECONSTRUCTION (INFERENCE)
PART 1
[var] G |- I: T if G(I) = T
G |- E1: S -> T, G |- E2: S
[app] ---------------------------
G |- E1(E2): T
G |- E1:bool, G |- E2:T, G |- E3:T
[if] ----------------------------------
G |- (if E1 then E2 else E3): T
[fn] ------------------------ if G' =
G |- (fn I => E): T1->T2
G |- D ==> G',
G'' |- E : T
[let] -------------------- if G'' = G(+)G'
G |- (let D in E): T
------------------------------------------
Q: What would be the hypothesis for the [fn] rule?
... G' |- E:T2, if G' = G,I:T
*** polymorphism (skip)
------------------------------------------
WHEN IS A FUNCTION POLYMORPHIC?
Is this a type error?
(fn f =>
g(f(2), f(true)))
What if f is:
val id = (fn x => x)
val succ = (fn x => x + 1)
------------------------------------------
In ML, the solution is that after you bind with val (or val rec),
mark type variables as generic (as in id)
thus within a function declaration,
the argument type is non-generic
but after the declaration it becomes generic
------------------------------------------
DON'T ALLOW CAPTURING
let val g = (fn x =>
let val f = (fn y => x)
in
if f(3)
then f(true)
else x + 5
end)
in
...
end
------------------------------------------
the problem with the above would be if we concluded that
f had type
all 'a . all 'b . ('a -> 'b)
instead of
all 'a . ('a -> int)
so can't mark a type variable as generic if it's being used in
the current type environment,
logically, don't want the all quantifier to capture 'b.
formally: a type variable occuring in the type of an expression E
is generic for a given scope if it does not occur
in the type of a lambda-variable declaration
that encloses the given scope
notation: in the type
all 'a . 'a -> 'b
'a is a generic type
The notation "all T . ..." is read "for all types T, ..."
non-generic type variables: type variables appearing in type
of a lambda-bound identifier
-shared among all occurrences of lambda-bound id (e.g., f)
-prevents heterogenous use of lambda-bound identifiers
generic type variables: when used, replace by a normal type variable
in practice, each instance of
all 'a . 'a -> 'b
is of the form
'c -> 'b
where 'c is a fresh (not used elsewhere) non-generic type var
------------------------------------------
GENERIC TYPE VARIABLES
in the type: all 'a . 'a -> 'b
'a is generic, and 'b is not
[var] G |- id: all 'a . 'a -> 'a
[gElim] --------------------------------
G |- id: 'b -> 'b, G |- id: 'c -> 'c,
G |- 3: int, G |- true : bool
[app] ------------------------------------
G |- id(3):int, G |- id(true): bool
[pair] -----------------------------------
G |- (id(3), id(true)) : int * bool
where G = {id |-> (all 'a . 'a -> 'a),
3 |-> int, true |-> bool}
CONSTRAINTS:
'b = int
'c = bool
------------------------------------------
------------------------------------------
TYPE RECONSTRUCTION (INFERENCE)
GENERICS and BINDINGS
G |- E: T
---------------------- if G' = I:gen(T,G)
G |- (val I = E) ==> G'
G'' |- E1: T
[rec] ------------------- if G' = I:gen(T,G)
G |- rec f = E1 ==> G'
where
gen(T, G) = all V1 . ... . all Vk . T
if V1,...,Vk are all the
free type variables in T
that are not in range(G).
G |- I : all V . T
[gElim] ------------------ if T' = [T''/V]T and T'' is fresh
G |- I : T'
------------------------------------------
The notation [T''/V]T means substitution of T'' for free occurrences
of V in T,
Here T'' is arbitrary, but must be fresh
*** polymorphic rules (skip)
------------------------------------------
EXAMPLE
G |- rec map = (fn f => (fn ls =>
if (null ls) then []
else (f (hd ls))::(map f (tl ls))))
==>
where G is
(op ::): all 'a . 'a * 'a list -> 'a list,
[] : all 'a . 'a list,
null : all 'a . 'a list -> bool,
hd : all 'a . 'a list -> 'a,
tl : all 'a . 'a list -> 'a list
------------------------------------------
[lemma1] G' |- (null ls): bool,
[lemma2] G' |- []: 'h list ,
[lemma3] G' |- (f (hd ls))::(map f (tl ls)) : 'h list
[if] _______________________________________________________
G' |- if (null ls) then [] else (f (hd ls))::(map f (tl ls)))) : 'h list
[fn] _______________________________________________________
G, map:'b, f:'c |- (fn ls =>
if (null ls) then [] else (f (hd ls))::(map f (tl ls))))
: 'g list -> 'h list
[fn] _______________________________________________________
G, map: 'b |- (fn f => (fn ls =>
if (null ls) then [] else (f (hd ls))::(map f (tl ls))))
: ('g -> 'h) -> 'g list -> 'h list
[rec] _______________________________________________________
G |- rec map = (fn f => (fn ls =>
if (null ls) then [] else (f (hd ls))::(map f (tl ls))))
==> G[map |-> ('g -> 'h) -> 'g list -> 'h list]
where G' = G, map: 'b, f : 'c, ls : 'd
constraints:
'b = ('g -> 'h) -> ('g list -> 'h list), 'c = 'g -> 'h,
'd = 'g list
Lemma 1: if 'd = 'g list, then G' |- (null ls): bool
Proof:
[var] G' |- null : all 'a. 'a list -> bool
[gElim] __________________________
G' |- null : 'g list -> bool, [var] G' |- ls : 'd
[app] ___________________________________________________________________
G' |- (null ls): bool
constraints: 'd = 'g list
QED
Have them do lemma 2 (lemma 3 takes too long).
Key ideas:
think of the checks to be made as either succeeding,
or constraining the result (by unification)
accumulate a set of constraints
types that are unconstrained at the end turn into universally
quantified (all 'a . TE('a)) types.
names bound with let (val or rec) are polymorphic,
but monomorphic instances are used in the program.
(see section 7.3 of Watt's Concepts and Paradigms book)
Formalization:
system of typings and equations, solve equations for unknowns
** Limitations
*** No recursive types:
type 'a stream = unit -> ('a * 'a stream)
no way to use this in the algorithm
solution: use abstraction to break the recursion
(think of above equation as an isomorhpism)
up, down are provided implicitly
by the datatype constructor
(up when applied to values
down when used in pattern matching)
*** Polymorphic functions are not first-class objects!
type variables in a function parameter are not generic
------------------------------------------
NO POLYMORPHIC FUNCTIONS AS ARGUMENTS
- fun F f = fn (a,b) => (f(a), f(b));
val F = fn : ('a -> 'b)
-> 'a * 'a
-> 'b * 'b
- val W = (fn x => (x x));
std_in:1.18-1.20 Error:
operator is not a function
operator: 'Z
in expression:
(x x)
------------------------------------------
All arguments to functions in ML have one type
(which is the correct instance if possible).
so no self-application, no Y combinator
another example of this...
- fun S f g x = ((f x) (g x));
val S = fn : ('a -> 'b -> 'c) -> ('a -> 'b) -> 'a -> 'c
- fun I x = x;
val I = fn : 'a -> 'a
- S I I;
std_in:8.1-8.5 Error: operator and operand don't agree (circularity)
operator domain: ('Z -> 'Y) -> 'Z
operand: ('Z -> 'Y) -> 'Z -> 'Y
in expression:
S I I