CS 641 Lecture -*- Outline -*- * Type reconstruction algorithm overview important as a review of compilation, abstract types ** Data Structures *** environment (map from ids to types) *** type expressions (type variables, type constructors) expressions may be represented by parenthesized strings, trees (exponentially large structures generated h(f(X), f(X)) vs. h(f(g(a)), f(g(a))) have to substitute g(a) for X twice) graph (dag) that allows sharing of subterms *** stack of type variables (allows generic/non-generic distinction) *** substitutions ** Operations *** environment (expand, get value, ...) *** type variables -generic? -equal? -copy (=new) *** type constructors -equal? -arity? *** type expressions -is it a variable? -get type constructor and arguments (if not variable) -apply substitution -occurs? (variable in type expression) -copy -unify (make types equal) *** stack of type variables (push, pop) *** substitutions -create (singleton) -compose -apply (to a variable) ** Unification skip the details unless asked *** Specification **** Definitions ***** Terms, Term algebra V = set of variables Sigma = set of function symbols, not including "=", each with arity function symbols of arity 0 called constants Sigma and V are disjoint Term algebra (T_{Sigma}(X)) = v in V, constant in Sigma, f(t1,...,tn) Ground terms (Herbrand Universe, T_{Sigma}) = set of terms with no variables assume this set has >1 elements vars: syntactic thing -> set of variables in it ***** Substitutions substitution = finite map of variables to terms such that s(x) ~= x for any variable x grounding subst is one that maps to ground terms extension to terms: s(t) = s(x) if t is variable x a if t is constant a f(s(t1),...s(tn)) otherwise exercise: define composition of substitutions such that for all terms t, (s1 o s2) t = s1(s2(t)) Q: does this make the term algebra into a category? no, get category with objects: sets of terms arrows: substitutions ***** Unifiers, most general unifiers terms t1 and t2 are *unifiable* iff there is a substitution s such that s(t1) is syntactically identical to s(t2) such an s is called a *unifier* for t1 and t2 a substitution is a *unifier for an equation set* iff it unifies the terms of each equation a substitution s is *idempotent* iff s = (s o s) Lemma: iff domain(s) does not intersect vars(range(s)) ordering s1 is more general than s2 if exists s3 s.t. s2 = (s3 o s1) s is a most general unifier (mgu) of a set of equations E iff whenever s1 is a unifier of E, then s is more general than s1. Lemma: If s is an idempotent mgu of E, then for any unifier s1 of E, s1 = (s1 o s) **** the problem given an equation e between terms return an mgu for e, or fail (if there is no unifier for e). can recover solution to any variables of interest from mgu **** basic algorithm ------------------ fun unify(VAR x,VAR y) = if Variable.equal(VAR x, VAR y) then Substitution.null else Substitution.singleton(VAR x, VAR y) | unify(x as TERM(f,args), VAR y) = unify(VAR y, x) | unify(VAR x, y as TERM(f,args)) = if Term.occurs(VAR x, y) then raise Failure else Substitution.singleton(VAR x, y) | unify(x as TERM(f,fargs), y as TERM(g,gargs)) = if not (Constructor.equal(f,g) orelse length(fargs) = length(gargs)) then raise Failure else let fun unifyargs(t1::rest1, t2::rest2, sigma) = let val tau = unify(t1,t2) in let val sigma' = Substitution.compose(tau, sigma) in if rest1 = [] then sigma' else unifyargs(rest1,rest2,sigma') end end in unifyargs(fargs,gargs,Substitution.null) end; ------------------ ** complexity Unification algorithms exist that run in time linear in size of terms to be unified is log-space complete for P, so hard to do in really small space But deciding whether an ML expression is well-typed is exponential! (complete for deterministic exponential time) Why? ---------------- -fun pair x y = (fn z => z x y); val pair = fn : 'a -> 'b -> ('a -> 'b -> 'c) -> 'c - let fun x0 z = z in let val x1 = pair x0 x0 in let val x2 = pair x1 x1 in x2 end end end; val it = fn : (((('a -> 'a) -> ('b -> 'b) -> 'c) -> 'c) -> ((('d -> 'd) -> ('e -> 'e) -> 'f) -> 'f) -> 'g) -> 'g ---------------- The type includes Omega(2^{cn}) distinct type variables for an expression of size O(n). ---------------- -let fun x1 y = pair y y in let fun x2 y = x1(x1(y)) in x2 (fn z => z) end end; val it = fn : (((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c ---------------- The type has length Omega(2^{2^{cn}}) when printed as a string, or, as a graph has Omega(2^{cn}) nodes for an expression of size O(n). So (in essence), for a program of size O(n), the system of equations generated may have size O(2^n), even when represented compactly.